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Supplementary methods 

As we noted in [22] , a bra-vector (e.g ., (s |), a ket-vector (e.g., |.v')), and a linear operator 

( e.g ., M ) could be considered as the convenient reminders of a row vector, a column vector, 
and a matrix, respectively, in the normal formulation of a continuous-time Markov model, if 

you want. Especially, the operator, M I (v, /), represents the insertion of / sites between the 

x -th and (x +1 )-th sites, and the operator, M D (x H , x E ), represents the deletion of the 
subsequence between (and including) the x B -th and x E -th sites. 


SM-1. Perturbation expansion of multiplication factor for local alignment: details 

Here, we give details of the mathematical expressions in section R1 of the main Results and 
discussion, from the beginning down to Eq.(R1.6). 


As in section Rl, let P (a(s A ,s £l ), [t n t F fj | (s' 4 , t ,)j be the probability that a PWA 

(a(s A , s D )) between an ancestral sequence state (s A ) and a descendant (s D ) result from 
evolution during a time interval ([t n t F ]), given s A at t 7 . (We ignore the residue states.) In 

[22] , we formally proved that P (a(s' 4 ,s D ), [t n t F fj | (s A , t 7 )j is given as a series: 


'v=au[«(A* d )] (JV) 


(a(s' 4 ,s D ),[t / ,t F ])| (s A ,t 7 )] = 

— Eq.(SM-l.l) 

(It corresponds to Eq.(Rl.l).) Here, A min [a(s A ,s D )] 


(a(s A ,s D ),[t / ,t f .])| (s A ,t 7 )], 

is the minimum number of indels 


required to create a(s A ,s D ). The term P {N) (a(s A ,s I> ), [t n t F ]j | (s A , t 7 )j is the fraction of 


(a(s A ,s D ),[t / ,t /r ])| (s A ,t 7 )] 


contributed from all N -event indel histories that can result in 


a(s A , s D ). Let H /D [N; a(s A , s D )j be the set of such N -event indel histories, and let 

[M,, M 2 , •••, M n ] bean A-event indel history, where each M v (v = 1,2,..., N ) is an 
operator representing the v -th indel event in the history. Then, the above term is expressed 
as: 
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<W) 


'(«( 5 A , 5 D ),[r / ,^])|( 5 A ,? / )] = 


[M t ,M 2 ,--,M N ] 

GH ,D [iV;a(/,i 0 )] 


([ 


AfpM,,' 


M N Ut n t F ])\(s A ,t I ) 


— Eq.(SM-1.2) 

Each term on the right hand side of this equation is given as: 
P 


([•Mp M 2 ,--,M n ], [tj,t F fj | (s A , tj) 

/•••/ dr l ...dr N \T^' v _r{M v \s v _ p t v )) exp j JJ V+ ' dr R"\s v , r) j 


h ~ T o <T \ <m ■ ' <t N <t N+\ -t F 


v=0 


|(s v |-(s v _ 1 |M v | v=l.v} 


— Eq.(SM-E3) 

Here, r(M v ; s v _ n r v ) is the (generally time- and state-dependent) rate that the sequence state 
(A’ v _!) undergoes the indel ( M v ) at time r v , and Rf(s v , r) is the (generally time- and 
state-dependent) exit rate of the state (s v ) at time r . These equations, Eq.(SM-1.2) and 

Eq.(SM-1.3), are essential when we calculate the probability terms under specific 
situations. 

Now, using some (but not necessarily all) preserved ancestral sites (PASs), we partition 
the PWA, a(s A , s D ) , into “local regions” (/.<?., inter-PAS regions), y,, y 2 ,..., y , in which 
all potentially causative indels are confined. In [22], we derived the two conditions. 
Condition (i): each indel rate parameter is independent of the portion of the sequence state 
outside of the local region where the indel occurred; and 

Condition (ii): the increment of the exit rate due to each indel event is independent of the 
portion of the sequence state outside of the local region where the indel occurred. 

Under these conditions, the above PWA probability, Eq.(SM-l.l) supplemented with 
Eqs.(SM-1.2,3), can be factorized as: 


= P 


’ (a(s A ,s z> ),[t / ,t J r])| Cs A ,fj)] 

^max 

([]. \tj,t F ]) | (s A , f 7 )] (cc(s A , s D ), [t n t F ]} | (s A , f 7 )] 


— Eq.(SM-1.4) 


(It corresponds to Eq.(R 1.2).) Here, P ([], [tj,t F ]) | (s A , t ; )j = exp|-J F dr Rf (s A , r)| is the 
probability that the sequence underwent no indels during given ,v' 4 at /,. And 

fi P [y K ;(a(s A ,s D ),[t I ,t F ])\(s A ,t I )], which was denoted as 


Pp 


(A /D [y K ; a(s A , 5°)], [t n t F ]j | (s A , tj ) in [22], is the multiplication factor contributed from 
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the local region, y K . The multiplication factor is a summation of contributions over all local 


indel histories that can yield the local PWA confined in y K . (The symbol, A /D \j K \ a(s A , ,y /J )j, 

denotes the set of all such local indel histories.) Thus, it can also be expressed as a series 
similar to Eq.(SM-l.l): 

ftp [/*:’ > S )» | ( 5 » ^/)j = ^ V_ A , s Dy r ^ U P{N) [iT’ ( a ( S > S ); [tlitp ]) | (5 

— Eq.(SM-1.5) 

(It corresponds to Eq.(Rl .3).) Here, N min [cf(s A ,s Z) ); y K j is the minimum required number of 


indels in y K . And the term [i P(N) yy K ] (a(s A , s D ), [t n t F fj | ( s A , t { ) j is the portion of the 
multiplication factor contributed from all local-PWA-consistent N -indel local histories in 
y K . Letting A ID ^N; y K \ a(s A , .v H )j be the set of all such local histories, the term is expressed 
as: 


t^P(N) |Vk’ ^ )j | (s ,t 7 )j 


Pf 




\[M i ,M 2 ,--,M N ],[t I ,t F ])\(s A ,t I ) 


Eq.(SM-1.6) 


where the probability quotient for a (local) indel history is defined as: 


Pp 

m P 


\[M v M 2 ,--,M N Ut n t F i) ^V,) 


([M 1 ,M 2 ,-- s M w ],[r / ,t f ])|( 5 A ,t / )]/p[([],[r / ,t F ])|( 5 A ,t / )] 


. — Eq.(SM-1.7) 


Similar arguments hold also for the probability, /’[afs,, s 2 ,..., s ^ x ] | 7j, that a MSA 

(a[Sj, s 2 ,..., ^xl) of N x sequences, s p s 2 ,..., s nX , results from the evolution along a given 
phylogenetic tree (7’)[22|. Basically in line with the idea in [18,19, 40], we can build up the 
probability of a MSA, first by multiplying the root state probability and the probabilities of 
ancestor-descendant PWAs along branches, and second by summing such products over all 
MSA-consistent ancestral states. The MSA probability thus composed can be expressed as a 
series: 

P[a[ W2 ,..., V ]|r] = ^ N _ N [«[*!, s 2 ,..., vl| T ]- — Eq.(SM-1.8) 
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(It corresponds to Rq.(Rl .4).) Here, N min is the minimum number of indels required for 
creating the MSA. (For simplicity, we omitted the obvious dependence of /V min on the MSA 

and the tree.) And R (A , ) [«[s 1 , s 2 ,..., | T j is the portion of the probability contributed from 


all MSA-consistent N -event indel histories along T . Let l F[./V; a[s 1 ,.s' 2 ,...,s\ vX ]; T j be 

the set of all such indel histories. Then, we have: 

P w [a[Si,s 2 ,...,s jv J|r]= J P| 

('-■{*»},) 

e>P®[v;a[. !l , S2 ,..., V ];7’] 

— Eq.(SM-1.9) 


(s Root , n Roo, \ 

P 

\ti{b) ] 

(s Root ,n Root ) 

. \ /. 


_[ J T 

\ J 


Here, j M(b )j J denotes an indel history along T that starts with the sequence state 

( s Root ) at the root node ( n Root ) and that consists of (mutually interdependent) indel histories 
( M(b )’s) along branches ( b ’s). ( M(b ) will also be denoted as: M x (b),---, M W(M ((7)j.) 



^ Root 1 


n 


denotes the (prior) probability that we have s Root at n Root . And 


\M(b) J [s Root ,n Root ) 

specific expression is: 

["<»}. ( 


is the probability that all M(b) ’s occur, given s at n'" .Its 


Root Root 1 

S , n 


I 


)- n p 

\b<E{b} T 


M(b),b\ ( s A (b),n A (b )) 


[s D (b)\-/s A (b)\M l (by~M N(b) (.b) 


for 'biEWr 


— Eq.(SM-l.lO) 
with 


M(b),b\ ( s A {b),n\b )) 


- P 


M x (b),---,M m) (b)],[t(n A (b)), t(n D (b))])\(s A (b),t(n A (b))) 


®iD(b) 


— Eq.(SM-l.ll) 

Here \b} T is the set of all branches in T. n A {b) and n D (b) are the upstream and 
downstream nodes, respectively, of branch b ; s A (b) and s n (b) are the sequence states at 
the respective nodes. And Eq.(SM-l.ll) explicitly records the dependence on the (possibly 
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branch-dependent) model parameter setting ( ® ID (b )). Eqs.(SM-1.10,ll) give essential 
building blocks for MSA probabilities. 

A MSA-counterpart of a PAS is a gapless column, which indicates that the 
corresponding site was hit by no indel throughout the evolution along T . (Hereafter, a 
gapless column in a MSA is also called a “PAS.”) Using some PASs, we partition the MSA 
into local regions, C,, C 2 ,..., C K . Meanwhile, there are infinitely many possible root 

sequence states ( s Root ’s) consistent with the MSA. Among them, we choose one as the 
“reference” root state ( s Root ). Then, in addition to the aforementioned conditions (i) and (ii), 
we impose the following condition. 

Condition (iii): the (prior) probability of each root state ( s Root ) is given by the probability of 
s Root multiplied by the product of factors over the local regions, where each factor depends 
only on the difference between s Root and s Root in a local region. 

Under these conditions, the MSA probability is factorized as: 


lv max ~ 

P[cc[ Sl , s 2 ,..., v ]| T] = P 0 [s Root | s 2 ,..., V']; C"; C k \t], - Eq.(SM-1.12) 


(It corresponds to Eq.(R1.5).) Here, 


p 0 [C”|r]-p[(sb i 


Root Root 1 

n 




exp|-y f' (n ° ,]) dr R l x(s Ro0 ', r) 
F | Zjbe{b} T J t ( n\b)) xy ° ’ 


— Eq.(SM-1.13) 


is the probability that the root sequence state is s Root and that it was hit by no indel all across 


T. And | T] is the multiplication factor contributed from the 


local region, C K . As in Eq.(SM-l .8), the multiplication factor also can be expressed as a 
series: 


M^[afSp s 2 ,..., s nX ]; 5 0 ; C K | rj - ^ N=N [ CK ]^P('V)[ a [ i i’ 5 2 ’•••’ 5 jv x ]’ 5 o ■> C K \ • 


— Eq.(SM-1.14) 

(It corresponds to Eq.(Rl .6).) Here, N min [C K ] is the minimum required number of indels in 
C K . And the term M P(JV) [«[.s P .s’ 2 ,..., .s vV ]; s Ro °'; C K | rj is the fraction of the multiplication 


factor contributed from all local-MSA-consistent N -indel local histories in C K . Letting 
A™ [N; C k ; rj be the set of all such local indel histories (accompanied by the 


root states), the term is expressed as: 
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M, (J0 [^[^i 


, s 2 ,..., s nX 


2 

EAS>[iV; C K ;a[s lA .j^];r] 


] . Root I rwi "I 

; s 0 ; c K | rj - 

V P [s Roo, ,s«° 0 ',n R °°'-,C K ]M f 


{*<*>}, 


(s* 00 ', //°°') 


x exp 


- 2 f, 

bE{b} T 


’(«) 

(*)) 


dr^f(s A (fc),^,r)[CK] 


^ D (fo)J-(^(fo)|^ 1 (&>-A^ ivc6) (fc) 

/or 


. — Eq.(SM-1.15) 

Here, 57?™ (s A (b), s Root , r)[C K ] is the difference of the exit rate of .s '(5) from that of s Root . 
And u 7 , [s Roor , ,y , n A> ""'; C K j is the (multiplicative) difference in the (prior) probability at 


n Ro °' between s Root and s Root . Both of them originated from C K . (It should be noted that, 
in Eq.(SM-1.15), s A (b) differs from s Root only within C K .) And the “raw” factor, 



r - 


M p 

\M(b)\ 

(s^ 00 ', n Roo> \ 


[ J T 


M p 

\mq>) } 

(s Ro °', n Roo, \ 


[ ) T 

\ / 


, is given as: 

t 


n ^ p 

bE{b} T 


M ( b ), b 


(,s A (b),n A (b )) 


i\ 


J / 


for *bEm T 


— Eq.(SM-1.16) 

See subsection 4.2 of [49] for the derivation of almost identical (but slightly different) 
equations. 


SM-2. Analytical expressions of parsimonious and next-parsimonious contributions (1): 
for local PWAs 

In case (i) (Figure SI a), the sequence states could be represented as s A = s D = [L, 7?]. In this 
case, N mm \j K \ a(s A , s D )j = 0 , and thus there is only one parsimonious indel history, [ ], 


where no indel event takes place. Therefore, in this case, total parsimonious contribution to 
the multiplication factor is: 

u P(0) [case (/)] = 1. — Eq.(SM-2.1) 

In this case, there is no history consisting only of one indel event. And each 


r - /v n 

next-parsimonious history should be a two-event history of the form, M / ( 1, /), M D ( 2, l +1) 

with l = 1,...,L C0 , where L co = min{L 7 °, L™} . Thus, the total next-parsimonious 
contribution is: 
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L 

Hpa)[case (i)\ = ^u r ( M 7 (l,/), M D (2,Z + 1)], [tj,t F ]\ | (s A , t 7 ) 

M L 


Eq.(SM-2.2a) 


Let (s[Z]|-(s a |m,(U) be the state resulting from the action of an insertion of / sites on 


s A . Then, using Eq.(SM-1.3) and Eq.(SM-1.7), each summand is calculated as: 


H P 


([M;(U), M d (2,1 + 1)], [t n t F ]) | h) 

r,(l,l;s A ,t)r D (2,l + l; s[l],t') 

x expj-J' dr dR^(s A ,s A ,r)- J dr dR'^(s[I],s A ,r)-J r dr 8Rf (s D ,s A ,r) 
= f r dtf F dt' r 7 (l,/; s A ,t) r D (2,l + l; s[l],t') expj-J dr 5Rf (s[/],.s' A ,T)j. 


= f dtf r dt' 


— Eq.(SM-2.2b) 

Here, 6Rf (s, s', t) = Rf (s, t) - Rf (s', t ) is the increment of the exit rate. The second 
equation of Eq.(SM-2.2b) follows from dR'^(s A ,s A ,r) = dR^(s D ,s A ,r) = 0 . We could at least 
numerically calculate Eq.(SM-2.2b) once the specific functional forms of the indel rates and 
the exit rates are given. For example, in a space-time-homogeneous model, like Dawg’s indel 
model [32] (see Eqs.(Rl .7,8,9) in the main text), we have 6R™ (s[l],s A ,r) = (A 7 + A D )l, and 
Eq.(SM-2.2b) is calculated as: 


{j i M I (l,l),M D (2,l + l)],[t I ,t F ^\(s A ,t I ) 

exp(-(A 7 + k D )l(t F -tj))- 1 + (A, + A d )l(t F -tj) 


fi P 

= Aj/j(/) A d / d (/) 


((A /+ A D )lf 


— Eq.(SM-2.2b’) 


Eq.(SM-2.2b’) (or Eq.(SM-2.2b) itself) indicates that 


lip 


M I (l,l),M D (2,l + l)],[t I ,t F ]^\(s A ,t I ) 


<\k I f I (l)k D f D (l)(t F -t i y for each l = l,...,L C0 . 


Applying this inequality to Eq.(SM-2.2a) and using another inequality, 

if 0 lfj° yt I^d 

2 „ m /»«)=22 t {l) 2 ,) - 1 .»<= have: 

u P(2) [case (/)] < |A 7 A d ( t F - tj) 2 . — Eq.(SM-2.3) 

Empirically, the rate of indels (A 7 + A D ) is estimated to be at most on the order of 1/10 of the 
substitution rate [24,35]. Eq.(SM-2.3) indicates that, in case (i), even if the elapsed time 
( t F - tj ) is such that the substitution process is nearly saturated (e.g., A v (/ f -t l )~4, where 
A s is the total substitution rate per site), Eq.(SM-2.2a) is at most on the order of 1/10 of 
Eq.(SM-2.1). Thus, in case (i), we expect that the parsimonious contribution should 
approximate the entire multiplication factor (Eq.(R1.3)) very well, as far as a single inter-PAS 
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position is concerned. [NOTE: Incidentally, for a gapless PWA segment consisting of L p (> 
2) PASs, the multiplication factor is formally given between Eq.(1.2.3) and Eq.(1.2.4) in 
[43].] 

In case (ii) (Figure Sib), we assume that the ancestral state has A L A sites in 
between the flanking PASs. Thus, the ancestral and descendant states could be represented as 

s A = [l, 1,...,AL a , and s D = [L, R ], respectively. As long as A L A <L C ^, 


A mi n [ case (0] = 1, and there is only one parsimonious indel history, M D ( 2, A L A + l)j, which 


consists of a single event that deletes the ancestral sites in between the PASs (Figure S2a). 
Therefore, the contribution by the parsimonious indel history is: 

f.i P{l) j case (ii); AL a j = J F dt r D (2,AL A +1; s A ,t) exp j-J' dr dRy D (s A ,s A ,t)~ J r dr dR 1 ^ (s D ,s A ,t) 
= J F dt r D (2,AL A + 1; s A ,t ) expj-J* F dr dR 1 ^(s D ,s A ,t )| • 


— Eq.(SM-2.4) 

Each next-parsimonious indel history is composed of two indel events. There are two types. 

r - l a 

(a) Two successive deletions, M D (x,x + l-l),M D (2,AL -/ + 1) with / = 1,...,AL -1 and 
x = 2,..., A L a -1 + 2 (e.g.. Figure S2b). And (b) an insertion followed by a deletion, 


Mj(x,l), M d (2,AL a + / + 1)] with l = 1,..., min{L) u , L L D U - AL A } and x = 1,..., AL A +1 (e.g., 
Figure S2c). Thus, in this case, the total next-parsimonious contribution is given by: 
fA P( 2 )[ case 00; AL A ] = /^p[(«); AL A j + [i p [(b); AL A j . —Eq.(SM-2.5a) 


Here, 


AL a -1 AL a -1+2 


gi P [(a); AL a ]= ^ ^ ([M D (x,x + l-l),M D (2,AL A -l + l)],[t I ,t F ] ] j\( S A ,t I ) 

1=1 x=2 L 

— Eq.(SM-2.5b) 

is the sum of contributions from the histories of type (a). And 

min{L™,L™-AL A }AL A +l = 

l* P [(by, AL a ]= J 2 ^ ([M I (x,l),M D (2,AL A +l + l)],[t I ,t F Vj\(s A ,t I ) 


1 =1 X=1 

— Eq.(SM-2.5c) 

is the sum of contributions from the histories of type (b). Fet 

(s A ■ [.v, — /1 = (s A |m o (x,x +1 -1) be the intermediate state in each type (a) history. Then, the 
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history’s contribution is calculated as: 




( M d (x, 


)],[^ f ])|(s a ,0 


x + l-1), M d (2,AL a -Z + l 


r D (x,x + l- 1; s A ,t) r D (2,AL A -l + l;s A - [x,-I],t’) 


/ h ^ r f F 

*' * xexp|-J dr SR^ D (s A -[x,-/], s A ,r)~ J F dr SR^ D (s D ,s A ,r)j 

— Eq.(SM-2.5d) 

Similarly, let (s A • [x,+Z]| = (s A |a/ 7 (x,Z) be the intermediate state in each type (b) history. 


Then, the history’s contribution is calculated as: 




( [M I (xJ),M D (2,AL A +l + l)],[t n t F ])\(s A J I ) 

r,(x,l; s A ,t ) r D (2,AL A + Z +1; s A - [x, +l],t') 
x expj-J dr dR ™ (s A -[x,+Z], s A ,r)~ J r dr 8Rf ( s D ,s A ,r) 


= J ' dt J ' dt' 


— Eq.(SM-2.5e) 

Eq.(SM-2.4) and Eqs.(SM-2.5a-e) can indeed be calculated at least numerically, given 

concrete indel rates and exit rates. For example, under Dawg’s indel model (Eqs.(Rl .7,8,9)), 

we have 8R'^(s d ,s a , r) = - (A, + A D )A L A , and Eq.(SM-2.4) becomes: 

r , exp ((A. + X d )AL a (tp-t,))-! 

ZV) [case («); A^] = Kf D ( AL A )- 1 A ’ ■ - Eq.(SM-2.4’) 

"I" J /\l j 

Similarly, using dR^(s A -[x,-l],s A ,r) = -(A / + A D ) Z and 

8R'x(s a - [x,+l],s A ,r) = + (A 7 + A d ) Z, Eqs.(SM-2.5d,e) under Dawg’s model are calculated, 
respectively, as: 


fi P 


M 


) (x,x + l-l),M D (2,AL A -l + l)],[t I ,t F ])\(s A ,t I ) 


A d / d (/)A d / d (AL a -Z) 


(A /+ A d )(A L A -l) 


(A/+A d )AL ( tf-tj ) 


-1 e 


( A.j+A. D )l(t F -tj ) 


-1 


(A 7 + A D )A l a 


(A, + A d )Z 


Eq.(SM-2.5d’) 


lip 


([m,(*,Z), M d (2,AL a +1 + 1)], [f 7 ,f F ]) | (s A , t,) 


a,/,(/) A 7J /„(AL 4 + Z) 


(A 7 +A d )(AL a + Z) 


3 (A 7 +A d )A L A (t F -tj) ^ 


(A 7 + A d )AL a 


(A 7 + A d )Z 


— Eq.(SM-2.5e’) 


Substituting Eqs.(SM-2.5d’,e’) into Eqs.(SM-2.5a,b,c), we can concretely calculate the total 
next-parsimonious contribution. Figure S4 shows the ratio of Eq.(SM-2.5a) to Eq.(SM-2.4), 
as a function of A L A (abscissa) and the expected number of indels per site ((A, + A D )(t F -t,), 
different curves). Here, the exact overall indel rate (A 7 + X D ) does not matter, because the 
probabilities are invariant under the simultaneous rescaling of the rate and the time interval 
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( t F -t,) that keeps (A, + A D )(t F -t,) unchanged. As indicated by Figure S4, each curve for a 
fixed (A, + k D )(t F -tj) reaches an asymptotic value slightly above 1 (unity) when A L A is 
sufficiently large. Thus, to define a threshold within which the parsimonious histories alone 
are likely to give a decent approximation of the multiplication factor, it is risky to use the 
point at which the ratio is 1 (unity). Here, we tentatively define the threshold, (A L) ( 0 N p , as the 
value of A L A at which the ratio is 0.5. With this definition, (A Lf^P is around 128, 31,12, 

6 and 3 when (A, + k D )(t F -t,) is 0.01,0.04, 0.1,0.2 and 0.4, respectively (Table SI). Hence, 
we have a rough inversely proportional relationship: (A L) ( 0 N p ~ 1.2/[(A / + k D )(t F -/,)], under 
the parameter setting used here. 

In case (iii) (Figure Sic), we assume that the descendant state has A L° sites in 
between the flanking PASs. Thus, the ancestral and descendant states could be represented as 


s A = [L, 7?] and s D = [l, i?], respectively. The ancestries satisfy vf (£ {L, R) 

for every i = 1,...,AL°, and v' 1 * v'] for every pair (/', j) with i * j , and their details 
depend on the responsible indel history. (Actually, such details don’t matter in the state space 
S n , as explained in [22] .) As long as A L° < L c ,° , A min [case (in)] = 1, and there is only one 

parsimonious indel history, M t (1, AL /J )j. The history consists of a single event that inserts 


the descendant sites in between the PASs. As in case (ii), each next-parsimonious indel 
history is composed of two indel events, and classified into two types, (c) Two successive 


insertions, 


M 


(1,A L° -l),Mj(x,lj\ with / = 1,...,AL d -1 and x = 1,..., AL° -l + l. And (d) 


an insertion followed by a deletion, 


M / (1,AL d + /),M d (*,x + /-1)] with 
l = 1,..., min{L™, L™ - A L°} and x = 2,..., A L° + 2 . The total parsimonious contribution 
( u P(i) | case ( iii ); AL° j) and the total next-parsimonious contribution (u /)(2) [case (iii); AL n j) 


can be calculated as in case (ii). And their calculations are detailed in Appendix A 1.1 of [43]. 
If calculated under the same setting as used for Figure S4 and with the same value of 
(A, + A D )(t F -1 ,), their ratio with A L° = L is identical to that in case (ii) with A L A = L , 
because of the symmetry between the probabilities under the time reversal. Thus, the same 
conclusions as in case (ii) can be drawn from Figure S4 also in this case. 

In case (iv) (Figuer Sid), we assume that the ancestral and the descendant states 
have A L A and A L D sites, respectively, in between the flanking PASs. Thus, the ancestral 
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and descendant states could be represented as s A = [L, 1,...,AL a , and 


s D = |^L, /?], respectively. Here, the descendant ancestries satisfy 

vf (£ {L, 1,...,AL a , R} for every i = 1,...,AL /J , and vf * vf for every pair (z, j) with i*j, 
and their details depend on the responsible indel history. (Again, the details don’t matter in 
S 11 .) As long as A L A < L™ and A L° < llf , N mm [case (zv)] = 2 . In this case, there are three 
types of parsimonious indel histories (Figure S3), (e) The deletion of the ancestral sites 


followed by an insertion of A L° sites, 


M d ( 2, AT 4 +1), M,{ 1, AL°)j (Figure S3a). (f) An 


insertion immediately on the right of the ancestral sites to be deleted, followed by the deletion, 
'Mj(AL a + 1, AL° + /), M d (2, AL a + / + l)j with / = 0,...,min{4° - AL°, L c ° - \L A } ( e.g , 


Figure S3, panels b and d). And (g) an insertion immediately on the left of the ancestral sites 


to be deleted, followed by the deletion, 


M,( 1, A L d + /), M d (\L d + 2, \L a + AL° +1 + 1)] also 


with l = 0,...,min{L™ - A L°, L™ - A L A } {e.g.. Figure S3, panels c and e). In this case, each 
next-parsimonious indel history is composed of three indel events, and classified into one of 6 
broad types: (h) two successive deletions followed by an insertion; (i) a deletion, followed by 
an insertion, followed by a deletion; (j) an insertion followed by two successive deletions; (k) 
a deletion followed by two successive insertions; (1) an insertion, followed by a deletion, 
followed by an insertion; and (m) two successive insertions followed by a deletion. And these 
six broad types can be further sub-classified into 24 sub-types, as described in Appendix A 1.2 
of [43]. There, the calculations of the total parsimonious contribution 


([case (zv); AL', AL n j) and the total next-parsimonious contribution 


{li P0) \case (zv); A L A , AL°j) 


are also detailed. 


SM-3. Integral equation system for “exact” multiplication factors for local PWAs 

In this section, we derive a system of integral equations to give practically exact solutions (or 
“exact” solutions, for short) of the multiplication factors for cases (i) and (ii). (Another 
system of integral equations, which gives “exact” multiplication factors for cases (i) and (iii), 
is derived in Appendix A1.3 of [43].) 

Here, we assume that the indel rates are locally homogeneous, which means that the 
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rates do not depend on the exact positions that the indels hit, as long as they are confined in 
the region that accommodates the local history. Thus, we assume that the indel rate is locally 
homogeneous and the exit rate is locally an affine function of the (local) sequence length, but 
that they may be non-homogeneous globally. (In terms of equations, we locally assume 
Eqs.(5.1.1a,b) of [49] for the indel rates and Eq.(5.2.4) of [49] for the local exit rate, but we 
assume something like Eqs.(5.3.2a,b,c) of [49] for the global exit rate.) We are now 
considering only cases (i) and (ii), in which (local) ancestral and descendant states should be 

s A = [l, 1,...,AL a , and s D = [L, /?], respectively, with AL a = 0,1,2,.... Because of the 


local homogeneity, the exit rate R^is, t) of a state s (G S) in this context depends only on 
the (local) sequence length, L(s) = 2 + AL(s) . Thus, A L(s) adequately represents the local 
sequence state s , and we let R™(AL(s), t ) denote its (local) exit rate. The starting point of 
the equation system is the fundamental integral equation (Eq.(R4.5) of [22]) for the 
finite-time transition operator, P ID (t,, t F ) . We sandwich the fundamental integral equation 


with 


and 



, and expand the instantaneous mutation operator, 


Qm (0 = 2m(0 + Qm(0 > using the components’ definitions (i.e., Eqs.(R3.12,13) of [22]). 
Because we know that no indels struck the flanking PASs (with ancestries L and R), we 
can ignore the effects of indels that hit the PASs. And, because we are now focusing on the 
local alignment, we will also ignore the indels completely outside of the region delimited by 
the PASs. Thus, we have: 


\P ,D (t n t F )\ 

sl a +\ l c ,° 


D\ A\ D 

S )= (S 5 


exp |- dt R'x (AT 4 , t)| 

+ ^ ^v\-f t dTRf(\L A ,T^g I {l,t)(s A \M I {x,l)P ,D (t,t F )\s D ) 

x-l l-l ' L 1 J 

min{A L a ,L c d °} M a -1+2 r , . 

+ 2 2 S' Fdt ^P\-J' drR x D {kL A ,T)\g D (l,t)(s A \M D (x,x + l-Y)P ID {t,t F )\i 


— Eq.(SM-3.1) 

(Here, g,{l,t) is the rate of an insertion of length /, and g D (l,t) is the rate of a deletion of 
length /.) In the present setting, the number of sites between the PASs, A L(s) , uniquely 
determines the local sequence state s (or, more precisely, the equivalence class of sequence 
states in the sense that they give the same probability of the finite-time transition to 

(s D | = (0 1 ). Thus, we let the local states denoted as (s A | = ^A L A |, (s D | = (0 1 , 
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(s A |a/ 7 (x,/) = L a + /|, and (s A |m d (x,x +1 -1) = L A -l |. We also introduce the notation, 

P ID (AL i—> AL'; [/, Z'|) = (AL| L)|AL'^ . Then, taking advantage of the independence of the 
indel rates, exit rates and P ID (AL I—> AL'; [t, t']) on the position of each indel (x), we have: 
P ID [AL a ^ 0; [t n t F ]) = 8(AL a , 0) exp| -f' F dt R x (AL a = 0, f)J 

L CO 

+ (A L a +1)2 f‘ F dt exp j- f dr Rf ( AL a , r jig,(/, t) P ,D (A L A +1 h-> 0; [t, t F ]) 
w ' L 1 7 J 

min{ AL a , } p . . 

+ ^ (AL A -/ + l)J' F Jt exp|-J'jri?f (AL A ,r)ig D (/,0/ ,/D (AL A -/^0;[Lt F ]) . 

M 7 L 1 7 J 

— Eq.(SM-3.2) 

Here <5(AL, AL') is Kronecker’s delta, which equals 1 if AL = AL', and 0 if AL ^ AL'. 
Eq.(SM-3.2) gives the desired system of integral equations for the “exact” probabilities, 

P ID (AL 1 h-> 0; [/,, t F ]j, with non-negative integers AL A = 0,1,2,.... This equation holds for 

every non-negative integer AL A and even if we replace the initial time /, with any time in 
the closed interval [ t,, t F ]. Thus, the equations can be solved iteratively, starting with the 
“zero-event approximation” of the probability, 

P^ (AL 1 h-> 0; \t, t F ]) = <5(AL a , 0) exp j-J r dr R' x (AL 1 = 0, r j|, and calculating the 

approximation at the n s th step, P F ’ (AL 4 i—> 0; [/, t h ]j, from the approximation at the 
previous step via the recursion relation: 

P''\ } (A L a 0; [t, t F ]) = <5(AL a , 0) expj- /'' dr R x (AL A = 0, r)} 

L co 

+ (AL 4 +1)2/; ^'[expj-j; dr R‘ x ° (AL A , r)l g/ (//) L ; “, (AL A + / ^ 0; \t\ t,\) 

l=\ ^ J ■ 

min{A L A ,L C »} 

+ 2 ( A ^-' + D f r " dt ’ ex P -/' dr Rf (AL A , r) \g D (l,t’) 1} (AL A -/ h-> 0; [t\ t F ]) . 

/=i L l J J 

— Eq.(SM-3.2’) 

If N id iteration steps are performed, the resulting probability, L^/AL 4 i—> 0; [t r , t F ]j (= 

2 j;(AL 4 (/,, )|0) , where P' X) U n t F ) is the collection of terms with N indels each), is 


14 



the summation of the probabilities over all possibly responsible local histories consisting of 
up to (and including) N m indel events. After the iteration is finished, the multiplication 
factor will be obtained by following its definition ( i.e ., Eq.(R6.3) in [22]). We have: 

N.id 

jlp ID ' [cases (/) & (//); A L a ] = ^ i U P( N ) [cases (/) & (a); A L A j 

N=0 

= P^ id) (aL a ^ 0; [t n t F ])/p[([],[t n t F ]) | (s A , t,)] - Eq.(SM-3.3) 

= exp{+ f " dt Rf (A L\ f)} P” m) (A L a ^ 0; [t„ t F ]). 

The accuracy of the numerical solutions will depend on how finely we partition the time 
interval [i ; , t F ] . If the interval is partitioned into N P equal-sized sub-intervals, we could in 

principle achieve an accuracy of o((N P )~ 4 ^ under Simpson’s rule (e.g., [63]). However, as 


the number of sub-intervals increases, it would take longer to complete the calculation. A 
naive implementation of the aforementioned numerical iteration would have the time 

complexity of O ( N ID ( L co ) 2 (N P ) 2 j and the space complexity of O^L co N P j, when we want 


the probabilities taking account of up to N ID indels per local region, with 
A L a = 0,1,2,..., AL'' iax (< L co ). Here we set llf = L™ = L co . This becomes impractically slow 
when either L co or N P is large, e.g., around 1000 or greater. It is likely that N P does not 
have to be this large, as it would be usually enough to set N P around 100 or smaller. 
However, L co will often be around 1000 or greater, indeed making the naive algorithm too 
slow to be practical. Fortunately, we can avoid this problem by rewriting the recursion 
equation, Eq.(SM-3.2’), as: 


P'" } (A L a h-» 0; \t, r F ]) = <5(A L\ 0) expj- f' F dr Rf ( A L A = 0, r) 

+ f F dt' expj- f dr Rf [m\ t)| d>™ } (AL A ^ 0; [t', t F ]) 


— Eq.(SM-3.4a) 


Here, the “auxiliary function,” . (AT 4 0; [/', t h ]j, is given by: 



i—> 0; [t, t F ]j 


w — Eq.(SM-3.4b) 

min{AL A ,z£ 0 } 

+ 2 [(AL a -/ + l)g fl (/,0 P^_ 1} (aL a -/ 1-> 0; [t, i F ]) . 

/-i 

Consider the following “two-sub-step” algorithm. In the first sub-step (in each iteration step), 


it calculates x ( AL' 1 i—> 0; [/, l h ]j’s via Eq.(SM-3.4b) and stores them for all 
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AL a = 0,1,2,..., L co at all time points, t = t r + i AA with i = 0,1,..., N P . And, in the second 
sub-step, it uses them to calculate the probabilities P"\ (AT' i—> 0; [/, t F ]) via Eq.(SM-3.4a) 

for the same sets of values of A L A and t . This algorithm can reduce the time-complexity to 
0{n id L co (L co + N P )Np} while keeping the space complexity to be o(L co Np^j . This 

algorithm does finish in a practical amount of time (typically on the order of an hour or 
shorter when implemented in Perl). But it may still be too slow to perform each time we 
evaluate the probability of a local MSA. Good news is that a single run of the iteration 
algorithm inevitably calculates the probabilities for all A L A = 0,1,2,..., AL') iax (< L co ) at all 
temporal partitioning points, t = t r + i^~ (i = 1,..., N P -1), as well as at t = t, (originally 
desired) and t = t F (trivial). Thus, once we calculate the probabilities with a fixed set of 
model parameters, we could use them to calculate the probabilities of various alignments 
(under various phylogenetic trees), as long as the model parameters remain unchanged. In any 
case, the time and space complexities might be further reduced without considerably 
compromising the accuracy by a clever beforehand discarding of terms unlikely to contribute 
significantly to the final probabilities. (And the computation will speed up at least 10-fold if 
implemented, e.g, in C.) Figure 2 shows the ratios of the multiplication factors, Eq.(SM-3.3) 
at N id = 1, 2, 5,10, 20 iteration steps, to that at N ID = 200 steps. When N ID > 2 , we 

actually started from N lD = 2 , at which the probabilities ( P'^ (AL 4 i—> 0; [/, t,_ ]j ’s) were 

calculated as explained in section SM-2, instead of from N ID = 0 as mentioned above, in 
order to enhance the accuracy of the approximation. As indicated by Figure 2, the accuracy of 
the probabilities improves in a step-wise manner as the number of iterations increases. 

Following the similar procedures, this time starting from the integral equation, 
Eq.(R4.4) in [22], we can also derive a system of integral equations for the multiplication 
factors for cases (i) and (iii), as described in Appendix A1.3 of [43], Thanks to the symmetry 
between the probabilities under the time reversal, Figure 2 can also be interpreted as the 
results of numerical calculations of this equation system under the same setting. 

SM-3.1. Fitting power-law to finite-time transition probabilities 

To examine how well the power-law function fits the finite-time transition probabilities 

( P ID (o h-> A L d ;\ l n t ]j with /£[/,, t F ] ) of case (iii) local PWAs, we performed the 
correlation and linear regression analyses on the log-log plots between A L° and 
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~(N ,d 200 ) ^ case ( 'I] . We used log(AL /J ) as the independent variable (X ) and 


log|d^ Vm ,00 ^| case (: in ); A L?\[t n t]jj as the dependent variable ( Y ). We weighted each point 

with A L d (=1,2,... ,300), by u^'"^ 2m/ \yase (Hi): A L°; \l n t] j, in order to mimic the 

population of the observed points in a virtual analysis based on these finite-time probabilities. 
These analyses were performed with A / : X D = 1:1, 1:3 and 3:1, and with other 
parameter setting described in section Ml of Methods. 

SM-4. Analytical expressions of parsimonious and next-parsimonious contributions (2): 
for local MSAs 

Compared to contributions to local PWAs, those to local MSAs are much more complex. In 
this section, we consider some simple but common patterns, under a tree T with three OTUs, 
corresponding to the external nodes, n, , n 2 and n 3 . Here, we regard its single internal node 
as the root node n Root for simplicity (panel a of Figure 4) . Let b m (in = 1,2,3) be the branch 
that connects the nodes n Root and n m . Let s m G S u (m = 1,2,3) be the (local) sequence 
state at node n m . Then, we consider the gap-configurations (or, more precisely, the homology 
structures) of the MSAs of the three sequences, a[y, s 2 , ,s ’ 3 J , as well as the consistent 
sequence states s Root F5 W at the root node n Root . As in the previous sections, we focus on 
the portion of MSAs delimited by a pair of PASs, whose ancestries are denoted as L and R . 
Here we consider four typical situations (see Figure 4; see also Subsection 3.4 of [49] for 
complexities concerning this issue). (I) None of {5 1 ,5 2 ,5 3 } has any site in between the PASs 
(Figure 4, panel b). (II) 5, and s 2 share the identical set of sites in between the PASs, but 
s 3 has no site in between (panel c). (Ill) y has a set of sites in between the PASs, but 
neither s 2 nor s 3 has even a single site in between (panel d). And (IV) s, has a set of sites 
in between the PASs, but s 3 has no site in between, and s 2 lacks a run of some, but not all, 
of contiguous sites of s, in between the PASs (panel e). These situations are not restricted to 
the 3-OTU trees but widely applicable to each portion surrounding a trivalent node of any tree 

topology, although they never exhaust all gap configurations. The time at n Root will be 
represented as t 7 , and the time at node n m will be represented as t F . m . The indel parameters 

along branch b m will be indicated by the subscript “ :m .” 

Case (I) is represented by the external sequence states s, = s 2 = s 3 (Figure 

4b). In this case, we have N min [cave (/)] = 0 . And the set of fewest-indel local histories, 
A-[V k = 0; C K ; T], is composed only of a no-indel history: 
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| M(b l ) = M(b 2 ) = M(b 3 ) = [ ]J, — Eq.(SM-4.1) 

with s Roor = s Root = [L, R]. Thus, according to Eq.(SM-1.15), supplemented by Eq.(SM-1.16) 
and Eq.(Rl .9), the total parsimonious contribution is: 

M P(0) [case(/)] = l. — Eq.(SM-4.2) 

Here we used [i p s Rm " , n Root ; C K ] = 1. No single-event local history can result in the gap 

configuration in this case. Each next-parsimonious indel history consists of two indels, and it 
can be represented as: 

{^(&J = [m / (U),M d (2,/ + l)], ] for Ve{1,2,3} 

— Eq.(SM-4.3) 

with mG {1,2,3}, / G |l,2,...,L™ (= min{L™ , , and with s Root = s Roo ‘ again. Thus, 


the total next-parsimonious contribution can be calculated similarly to Eqs.(SM-2.2a,b). We 
have: 

L co 

M p(0) [case(I)]= J [([m,(1,Z),M d (2,/ + 1)], [t n t F , j) | (s Root , f,)] . - Eq.(SM-4.4a) 

m=l,2,3 /=1 L J 

Each summand can be calculated from Eq.(SM-2.2b), by replacing s A with s Root and also 
replacing the time and rate parameters with those assigned to each branch. Especially, under 
Dawg’s model, each summand is calculated as: 





= KJiJJ) K.JdJ 1 ) 


eX p( T)) l + + h ) 

{<,K. m +KJi) 2 


— Eq.(SM-4.4b) 

If the three branches share the same time interval and the indel rate parameters, the total 
next-parsimonious contribution, Eq.(SM-4.4a), is exactly three times Eq.(SM-2.2a) for a 
PWA. Indeed, this total next-parsimonious contribution on a general tree can be calculated by 
summing Eq.(SM-2.2a) (with appropriate parameters) over all branches of the tree. Following 
the same line of reasoning as around Eq.(SM-2.3), this total contribution is roughly 
proportional to the summation of the squared branch lengths over all branches. This means 
that a richer sampling of the homologous sequences will not significantly increase, or might 
rather slightly decrease, this total contribution, as long as the maximum evolutionary distance 
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remains at a similar level. Incidentally, any root sequence state of the type 
s Soo '=[l,1,...,AL Soo ',r] is also consistent with «[y,s 2 ,s 3 J in this case. Such a state, 


however, would require at least three indels, in order to delete the extra sites (1 ,...,AL Root ) 
independently along the three branches. Thus, the contributions from the local indel histories 
with such root states would be smaller in general. 


Case (II) is represented by the external sequence states 5, = s 2 = [l, 1,..., AL m2 , 7?] 


and v, = [L, R] (Figure 4c). In this case, the “phylogenetic correctness” condition (see, e.g., 

[46,47]) dictates that the root state s Root must have all the sites with ancestries 1,...,AL° 12 . 
In this case, we have N min [case (//)] = 1. And the set of parsimonious local histories, 

A™ [N k = 1; C K ; a[Sj,5 2 ,s 3 ]; T ], consists of a single element: 


M(b { ) = M(b 2 ) = [ ], M(b 3 ) = [M D ( 2,AL° 12 +1)] , 


— Eq.(SM-4.5) 


with s Root = s R °"' = | L, 1,...,A L on , . Again, according to Eq.(SM-1.15) supplemented by 

Eq.(SM-1.16) and Eq.(R1.9), the total parsimonious contribution turns out to be exactly the 
same as Eq.(SM-2.4) for case (ii) of PWAs, with the parameters replaced with those assigned 
to the branch b 3 , and with A L A replaced with AL D ' 2 . Especially, under Dawg’s model, we 
have: 


M ni) [case (II); AL° 12 ] = k D .J D . 3 (M ml ) 


exp|(A /:3 + A a3 )A L dv (t F:3 -1 ,)j -1 
(A/ :3 +A D:3 )AL D12 


— Eq.(SM-4.6) 

As in case (ii) of PWAs, each next-parsimonious indel history is composed of two indel 
events, and there are two types of such histories. One is based on type (a) in case (ii): 


M(b l ) = M(b 2 ) = [ ],M(b 3 ) = 


M D (x,x + l - 1), 


M D (2,AL Dl2 -l + \ 


»] 


Eq.(SM-4.7a) 


with l = 1,..., AL dp -1, x = 2,..., AL mi -1 + 2 , and also with s Root = s Ro °'. And the other is 
based on type (b) in case (ii): 


M(b x ) = M(b 2 ) = [ ], M(b 3 ) = [Mj(x,1), M d (2,AL D12 + / +1)] , — Eq.(SM-4.7b) 


with / = 1,..., min{L™, Lq° 3 - AL DP } , x = 1,..., AL DP +1, and also with s Root = s Root again. 
Thus the total next-parsimonious contribution is given by: 
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M 


P(. 2) 


[case (II); AL° 12 ] = M P [(a); A L on ] + M p [(fc); AL° 12 ]. — Eq.(SM-4.8) 


Here M p [(a); AL DI2 ] and M,,[(/?); AL /Jl2 1 are given by exactly the same equations as 

Eqs.(SM-2.5b,d) and Eqs.(SM-2.5c,e), respectively, with the aforementioned due 
replacements. Especially, under Dawg’s model, the contributions by the individual 
next-parsimonious indel histories are given by Eqs.(SM-2.5d’,e’) with the due replacements. 
Thus, Figure S4 can also be interpreted exactly as the comparison between the total 
parsimonious contribution and the total next-parsimonious contribution in the current case. 
Incidentally, root sequences with some additional ancestral sites in between the PASs of 

Sq°°' = \l, 1,..., AL D12 , are also consistent with a[v,, s 2 , s 3 J in this case. However, such root 

sequence states require at least three indels each to give rise to a[Sj,s 2 ,s 3 ]. This is because 
the additional ancestral sites need to be deleted independently along b x and b 2 , even if they 
are deleted simultaneously with the sites with the ancestries 1,...,AL DI2 along b 3 . Thus, in 
general, the indel histories consistent with such root states are expected to contribute much 
less to the multiplication factor. 


Case (III) is represented by the external sequence states s x = | L, 1,...,A L m , 7?] and 


(Figure 4d). In this case, the phylogenetic correctness condition does not 
require the root state to have any site in between the PASs. Thus we have s Root = [L, 7?]. As 
in case (II), we have N mm [case (///)] = 1. And the set of parsimonious local histories, 

A™ [N k = 1; C K ; a[,s p s 2 ,s 3 ]; T ], consists of a single element: 



with s Root = s Root . Again, as in case (II), the contribution by this local history turns out to be 
exactly the same as Eq.(A 1.1.1) (in Appendix A 1.1 of [43]) for case (iii) of PWAs, with the 
parameters replaced with those assigned to the branch b x , and with A L° replaced with 
AL di . Especially, under Dawg’s model, we have: 


l-exp(-(A /:1 + A D;1 )AL D1 (t F:1 -/y)) 
(A/:i+A D: 1 )AL di 


M PW [case (111); AL m ] = A /;1 / /:1 (AL 01 ) 


. — Eq.(SM-4.10) 


As in case (iii) of PWAs, each next-parsimonious indel history is composed of two indel 
events. Unlike case (iii) of PWAs, however, there are three types of such histories. Two of 
them (classified as “type (A)”) are similar to those in case (iii), but the other one (classified as 
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“type (B)”) is totally new. Specifically, the first one is based on type (c) in case (iii): 


M(b l ) = [M I (l,AL m -l),M I (x,l)],M(b 2 ) = M(b 3 ) = [ ] , — Eq.(SM-4.11a) 


with / = 1,..., AL DI -1, x = 1,..., AL D1 -l + l , and also with s Root = . The second one is 

based on type (d) in case (iii): 

[m(^) = [m 7 (1,AL d1 + /), M D (x,x + l- 1)], M(b 2 ) = M(b 3 ) = [ ]} , — Eq.(SM-4.11b) 

with / = 1,..., min{L™, L™ - AL D1 } , x = 2,..., AL /)l + 2 , and also with s Root = s Root again. The 
third one involves events along b 2 and b 3 , instead of along b { : 


M(b x ) = [ ],M(b 2 ) = M(b 3 ) = [M D (2,AL D + 1)]J. — Eq.(SM-4.11c) 

It is consistent with the root state s Root = 5, = [ L, 1,..., A L m , /?] instead of s Roo ‘ = [L, 7?]. It 

should be noted that there is only one local history of the third type. In this case, therefore, the 
total next-parsimonious contribution is given by: 

M P(2) [case (III); AL D1 ] = M P(2) [case (III); (A); AL Dl ] + M P(2) [case (III); (B); AL DI ] 

— Eq.(SM-4.12a) 

M p(2) [case (III); (A); AL m ] s M P [(c)] + M p [(rf)], — Eq.(SM-4.12b) 


M P(2) [co5e (III); (B); AL D1 ] ^ M P [(3rJ)]. 


Eq.(SM-4.12c) 


Here, M P [(c)] and M r [ (d) ] are the summed contributions of the type (c)-based and type 
(d)-based histories, respectively. They are given by exactly the same equations as 
Eqs.(A1.1.2b,d) and Eqs.(A1.1.2c,e), respectively, in [43], with the aforementioned due 
replacements. Under Dawg’s model, these two terms are given by summations of 
Eqs.(A1.1.2d’,e’) in [43] with the due replacements. Thus, Figure S4 can also be interpreted 
as the comparison between the total contribution of these two types of next-parsimonious 

indel histories (M P(2) | case (III); (A); A L m j) and the total parsimonious contribution 


(M P(1) [case (III); A L m j). Meanwhile, M p [(3rJ)] is the contribution from the unique 

next-parsimonious indel history of the 3rd type (i.e., type (B)), Eq.(SM-4.11c). According to 
Eq.(SM-1.15) supplemented by Eq.(SM-1.16) and Eq.(Rl .9), it is expressed as: 
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M p [(3rJ)] = /i p [ 


Root Root Root 

s =s l ,s 0 ,n 


; C K ] expJ - 2 ST dt dR "'"‘^ R "°' = ' s '’ s °° 0t ’ l) 


n= 1,2,3 


X Yi fT dt r O:»<( 2 ’ ALD1 +1; sRO °' = S 1>A> expj-j S dt dR "-rn( S m’ S ' K ""' = S l>0 

m=2,3 ' L 


— Eq.(SM-4.13) 


Under Dawg’s model, we have 8R£ m (s '= s lt s“ oot ,t)\ b = (A /:m + A D:m )AL for w = 1,2,3 


and 




(A, 


-V)k— (4» + ^:JAi D 


for m = 2,3 . Moreover, if we assume the 


uniform distribution of the root sequence length, we have ji P [^ oor = 5,, s^°°' 


J P J 0 ’ 


n Soor ;C K ] = 1 . 


Thus, Eq.(SM-4.13) is reduced to: 
M P [(3rJ)] = exp(-(A /:1 + A ftl )A L Dl (t F:1 -tj)) 


n 

m= 2,3 


K,J D ,S^ m ) 


l-exp(-(A /:i 

+ A n )A L m (t F 

D.m' \ F:m 

-A)) 

(A/ : 

+ A n )AL°‘ 

n D:m' 



— Eq.(SM-4.13’) 

Figure 5 shows the ratio of M P [(3rJ)] (= M P(2) Y ase (B); A L Di j) to the total 


parsimonious contribution, Eq.(SM-4.10), when all three branches have the same length and 
are assigned the same indel model as that used for Figure S4. Because the ratio compares the 
multiplication factors concerning the indel events along different branches, its value actually 
depends on several factors. It would be convenient to keep in mind that the ratio could be 

approximated by A rt:2 / 0:2 (AL rtl )(t F . 2 -1,) A W:3 / 0:3 (AL ,JI )(t F . 3 - b)/[A /:l /p,(AL HI ){t F . A -/,)] when 

(A j. m + A D . m )A L m (t F . m - tj ) ’s are sufficiently smaller than 1 for all m = 1,2,3. In general, as 

AL di gets larger, the ratio is expected to decrease, because the relative frequencies of long 
indels ( f Lm (A L m ) and f D . m (A L Dl )) are small in general. The ratio is expected to be much 

smaller than 1 in general. However, it may become quite large when the relative frequency of 
deletions compared to insertions ( i.e ., the ratio A />m y / A / . m ) is considerably larger than 1, or 
when the lengths of b 2 and A, are much larger than that of b x ( i.e ., 
t F -i ~ 1 n t F -3 ~ 1 1 >:> A-1 - A )• Such situations are similar to those causing the “Felsenstein zone” 
regarding a substitution model, where a non-parsimonious substitution history at a site is most 
likely to occur along a tree (see, e.g.. Chapter 9 of [5]). Under the conditions used to draw 
Figure 5, an indel history of the 3rd type (i.e., type (B)) has a probability much smaller than 
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that of the parsimonious indel history. The former is less than 5% of the latter even when as 
much as 0.4 indels per site are expected to occur. This is probably because the type (B) 
history requires an exquisite spatial coordination of deletions along different branches. And 
the result implies that the “Felsenstein zone” of indels should generally be quite narrow, 
consisting of the cases where a node is connected with branches with extremely unequal 
lengths. 

Case (IV) is represented by the external sequence states, 5, = | L, 1,..., AL HI , i?], 


= [l, j,...,AL m , i?], and ,v 3 = [L, R], with 1 < / + 1 < j < AL DI +1 


but 


(i, j ) ^ (0, AL D1 +1) (Figure 4e). (Here “ 1,...,0 ” and “ A L Dl +1,..., A L m ” should be considered 
to be empty.) In this case, the phylogenetic correctness condition requires the root state to 
have sites with ancestries 1,...,/ and j,...,AL Dl , on top of the PASs. Thus, we have 


s Root = s 2 = [l, 1,...,,/, j,...,AL Di , r]. Here, the minimum number of indels is 


N min [cave (/V)] = 2 . And the set of parsimonious local histories, 

A™ [N k = 2; C k ; a[Sj,,s 2 ,,s 3 ]; T ], consists of two histories. One starts with the root state 
s Ro °t = s Ro °' (= ,v 2 ), and is represented as: 


M{b x ) = [M,(i + 1, j - i - 1)], M(b 2 ) = [ ], M(b 3 ) = [M D ( 2, AL m -j + i + 2)] 


— Eq.(SM-4.14a) 

The other starts with the root state s Root = s, = [l, 1,...,AL d1 , , which differs from 


Root 


(= s 2 ). It is represented as: 


M(b x ) = [ ],M(b 2 ) = [M D (i + 2, j')], M(b 3 ) = [M D ( 2, A L m +1)] . 


— Eq.(SM-4.14b) 


The total parsimonious contribution and the total next-parsimonious contribution are 
calculated in Appendix A2 of [43] . 


SM-5. Algorithm to compute first-approximate MSA probability 

As briefly mentioned in section R5 of Results and discussion, we developed an algorithm that, 
under a given phylogenetic tree of the sequences and a given indel model (including its 
parameters), calculates the first-approximate probability that a given MSA actually occurs, 
using only the parsimonious indel histories consistent with the MSA. As a byproduct, the 
algorithm also calculates the relative probabilities among the parsimonious indel histories 
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(more precisely, among the parsimonious ancestral state sets). In this section, we will describe 
the algorithm. Then, in Sections SM-6,7,8, we will describe some analyses that were 
performed to validate the algorithm. 

In this study, when we refer to a “MSA,” we consider only its homology structure 
[39], and other details (including residue states) are ignored. For example, the “probability of 
a MSA” means the probability of the homology structure of the MSA under a given genuine 
indel evolutionary model. It should be noted here that the algorithm proposed here assumes 
that the input MSA is correct. Under this assumption, the algorithm approximately calculates 
the probabilities concerning the MSA. 

SM-5.1. Outline 

Panel A of Figure S5 shows a flowchart of the procedures comprising our entire algorithm. 
Broadly speaking, the algorithm consists of three parts: (i) the “pre-processing” procedures 
that finally partition the entire input MSA into gapped segments (/.<?., local MSAs) and 
gapless segments separating them (steps ia-ic); (ii) enumerating the parsimonious local indel 
histories that can explain each gapped segment (step ii); and (iii) calculating the 
first-approximation of the augmented multiplication factor 

(Mp^^^afsj, s,,..., s^]; s* 00 '; C K | r], given by Eq.(SM-1.15)) contributed from each 

gapped segment (C K ) (step iii). The final results thus produced are put together, along with 
the overall factor (Eq.(SM-1.13)) to provide the first-approximation of the total occurrence 
probability of the entire MSA (Eq.(SM-1.12)) as well as the relative probabilities among the 
parsimonious local indel histories that could explain each gapped segment (step iv). Panel B 
of Figure S5 schematically illustrates the procedures constituting the pre-processing part 
(steps ia-ic). The steps (ii) and (iii) will be described in Subsections SM-5.2 and SM-5.3, 
respectively. 

After an input MSA is given [ Figure S5, step (o) ], the algorithm first reduces the 
MSA to a binary pattern. In the binary pattern, each cell specified by a row (sequence) and a 
column (site) is given any of the two states: “presence” (denoted as “1”) when the cell is 
occupied by a residue, or “absence” (denoted as “0”) when it is filled with a gap [ step (ia) ] . 
Then the algorithm decomposes the MSA into “gap-pattern block”s, or “block”s for short, 
each of which consists of contiguous columns with the same presence/absence pattern [ step 
(ib) ] . Among such blocks, those containing no absence state play a distinct role as separators. 
If the MSA is correct, the existence of a gapless column indicates that no indel events 
occurred on or pierced the column. This is a corollary of the phylogenetic correctness 
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condition ( e.g., [46,47]). Thus, gapless columns flanking a gapped segment genuinely delimit 
the indel events potentially responsible for the segment, even if they constitute 
non-parsimonious histories. We have another note. In general, two contiguous gapless 
columns do not preclude indels in between them (e.g., a next-parsimonious history, 
Eq.(SM-4.3), yielding a case (I) local MSA). Nevertheless, the algorithm described here 
ignores such indels between contiguous gapped columns, because it is only interested in 
parsimonious indel histories. 

Then the algorithm makes a “cluster” out of a run of contiguous blocks containing 
the absence state and not separated from each other by gapless columns [ step (ic) ]. Thus, 
each cluster spans between a gapless segment and the next gapless segment (or a MSA end). 

In this paper, we simply refer to such a cluster of gap-pattern blocks as a “gapped segment” 

(or a local MSA). As explained in [22], indel events and the probability of a local indel 
history in each gapped segment can be considered independently of events in the other gapped 
segments (even if we allow for non-parsimonious indel histories), as long as the indel model 
fulfills conditions (i), (ii) and (iii) (given in section R1 of Results and discussion). 

After the pre-processing part (step (i)) explained above, the two core parts follow: 
enumerating parsimonious local indel histories for each gapped segment (step (ii)), and 
calculating the multiplication factor from the segment (step (iii)). They will be explained in 
SM-5.2 and SM-5.3 below. The “post-processing” step (iv) will also be explained in SM-5.3. 

SM-5.2. Enumerating all parsimonious local indel histories 

The first core part of our algorithm is itself an algorithm. It attempts to enumerate all 
parsimonious local indel histories that can yield each gapped segment. This core part consists 
of two subparts. (1) First it constructs an initial candidate for the local parsimonious indel 
histories, by identifying the unique Dollo parsimonious history [64] for each gap-pattern 
block, and by merging together indel events of the same type in effectively contiguous blocks 
and along the same branch of the phylogenetic tree (Figure S6). (2) Then it iteratively 
searches for local indel histories whose events are fewer than or as many as those in the 
current candidate parsimonious histories (Figure S7). And it updates the set of candidate 
histories if such a history is found. It should be noted that, because we consider the input 
MSA to have resulted from an evolutionary process, the candidate indel histories must 
conform to the phylogenetic correctness condition (e.g., [46,47]). We used the Dollo 
parsimonious state [64] (in each gap-pattem block) as a starting point because it conforms to 
this condition. [NOTE: The Dollo parsimony criterion [64] seeks for an indel history 
consisting of the fewest events that can explain the gap-pattern, while only allowing for at 
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most one insertion (per column or block) in order to keep the phylogenetic correctness.] In the 
following, we will explain these sub-parts in more detail. 

(1) Constructing an initial candidate of parsimonious local indel histories. The first 
candidate history is constructed based on the block-wise Dollo parsimonious indel histories. 
The Dollo parsimonious indel history for each block can be easily and quickly constructed by 
a round-trip traversal of the (rooted) phylogenetic tree, first bottom-up and second top-down. 
In the bottom-up traversal, each node ( n ) is assigned the number of child nodes each of 
which has at least one extant descendant node with the “presence” state. Let the number 
denoted as N CDP {n ). When reaching the top (/.<?., the root node n Roo> ), the root is assigned 
the “presence” state if N CDP (n Root ) > 2 , otherwise it is assigned the “absence” state. Then, in 
the top-down traversal, each node (again n) is assigned the “presence” state, either if 

(a) N CDP (ri) > 2 , or if (b) N CDP (n ) = 1 and its parent is assigned the “presence” state. 
Otherwise, the node is assigned the “absence” state. Then, indels are inferred to have occurred 
only along the branches whose ends are in different “presence”/”absence” states. Once the 
Dollo parsimonious history is constructed for each block belonging to the gapped segment, 
the algorithm tries to reduce the number of indels by merging the effectively contiguous indel 
events of the same type (either all insertions or all deletions) and along the same branch in the 
sequence phylogeny (Figure S6). The “effectively contiguous” indel events can be either 
events in literally contiguous blocks (Figure S6, panel A) or events separated only by a (run 
of) block(s) that is (are) devoid of the “presence” state in any ‘downstream’ nodes (in the 
virtual temporal direction such that the event is viewed as a ‘deletion’) (panel B). [NOTE: 
What the single-quotes exactly mean will be explained below (in SM-5.2.1).] When two 
events of the same type along the same branch are intervened by a block with the “presence” 
state in some ‘downstream’ nodes (the red “1” in panel C), however, the events are left 
unmerged. [NOTE: If we deal with the homology structure at this stage, the events could be 
merged even in the situation in Figure S6C. For future use, however, we decided that this 
stage should deal with each gap configuration faithfully as created by a true indel process, 
even if its homology structure indicates other treatments. And we also decided that each input 
MSA should be converted to its homology structure at a pre-processing step (detailed in 
Methods of [38]). ] In most cases, this sub-part determines the unique parsimonious local 
indel history for each gapped segment. 

(2) Iteratively updating the set of candidate parsimonious local indel histories. Not 
always and yet considerably frequently, the first sub-part doesn’t suffice to enumerate the 
parsimonious local indel histories. For example, in the situation illustrated in Figure S7, panel 
A, there could be another parsimonious history (panel C) on top of the initial history 
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constructed in the first sub-part (panel B). In another example (panel D), there even exists a 
history (panel F) that requires less indel events than the intermediate candidate history (panel 
E), which requires as many indels as the initial, Dollo parsimonious history (not shown). Such 
histories can be found by iteratively updating the set of candidate parsimonious local indel 
histories, via “branch-and-merge” operations (panels G, H, F). A branch-and-merge operation 
begins with “branching” a ‘deletion’ event, that is, re-interpreting a ‘deletion’ event along a 
branch (panel G) as multiple independent ‘deletions’, each along one of the ‘child’ branches 
(panel H). [NOTE: The single-quotes will be explained below (in SM-5.2.1).] Then, the 
“merging” process merges each resulting ‘deletion’ event with the effectively contiguous 
‘deletion’ event(s), if at all, creating a new local indel history (panel F in this example). If the 
newly created history requires fewer indel events than the current candidate histories, then the 
new history replaces the current candidates. If the new history requires as many events as the 
current candidate(s), it joins the set of current candidates. Otherwise, the new history is 
discarded and, if some special conditions are met, the algorithm tries a more complex 
“branch-and-merge” operation as an attempt to exhaust all promising histories (detailed in 
Ml.2.2 of [48]). 

If you will, this second sub-part could be called a “local multi-path downhill 
search algorithm.” From each point, i.e., a local indel history, it examines only its 
neighborhoods, which are separated from the point by a single “branch-and-merge” operation. 
In this sense, it is a “local search.” Then, it keeps only those histories that consists of fewer 
indels than, or as few indels as, the current candidate. Thus it is “downhill.” At the same time, 
it keeps all histories that are found to have the same, “current-smallest,” number of indels. 
Hence it has the qualifier, “multi-path.” 

SM-5.2.1. Assigning virtual temporal directions and ordering indel events 
To exhaust (almost) all promising histories, the “branch-and-merge” processes (explained in 
SM-5.2, item (2)) are iterated from the “most influential” ‘deletion’ events to the “least 
influential” ‘deletion’ events (Figure S8). (Each “most influential” ‘deletion’ event ‘deletes’ a 
relevant sub-sequence from the largest number of aligned sequences.) Fet us first explain 
what these single-quoted terms mean. 

First, if the tree of the aligned sequences is rooted (panel A of Figure S8, left), it is 
converted to an unrooted tree (panel B). Then, all the indel events (panel A, right) are 
re-interpreted as ‘deletions’ (panel C). This could be done because the time direction could be 
arbitrarily assigned on an unrooted tree, and because an insertion can be regarded as a 
deletion in the opposite time direction. The time direction may not be assigned consistently to 
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all branches, for example when insertions and deletions co-exist along a branch. However, 
this doesn’t matter and we will assign a unique ‘virtual time direction’ to each indel event , 
because it is only ‘deletion’ events with consistent directions that can be merged together, and 
because this re-interpretation is just a means to determine the order of the events that will go 
through the “branch-and-merge” operations. (And a term will be single-quoted when it 
applies under this virtual time direction.) 

Now, the ‘deletions’ will be sorted in descending order of the number of ‘deleted’ 
sequences (panel D), and they will be processed from top to bottom of the list. The order is 
determined uniquely, except the ambiguity in the ordering among events that ‘delete’ the 
same number of sequences. This ambiguity is not expected to matter seriously, because such 
events won’t be merged together in any “branch-and-merge” process. 

A list of ‘deletions’ to be examined accompanies each candidate local indel history. 
Each time a “branch-and-merge” operation is tried on the ‘deletion’ at the top of the list, the 
list is updated by removing the top ‘deletion’ just examined. If a “branch-and-merge” 
operation succeeds in finding a new promising candidate history, the new history is 
accompanied by a new list created by replacing the examined top ‘deletion’ with the resulting 
new ‘deletion(s).’ (The latter will be incorporated in the right order according to the number 
of aligned sequences that the sub-sequence was ‘deleted from.’) 


SM-5.3. First-approximate calculation of absolute occurrence probability and relative 
probabilities 

The second core part of our algorithm calculates the first approximation of the ab initio 
occurrence probability of a given entire MSA under a given phylogenetic tree and a given 
indel evolutionary model, using only the contributions from parsimonious indel histories that 
are consistent with the MSA. The calculation is based on Eqs.(SM-1.12-16) for the 
probability of a given MSA, a[5j,5 2 ,...,s ¥ ]. [NOTE: The current latest version actually 

calculates the MSA probability based on Eqs.(SM-4.20-22,18,13) of [22]. Together, these 
equations express the multiplication factor from a local MSA as the product of the 
multiplication factors of the constituent local PWAs.] The current version of the 
implementation of this core part only calculates the probability under Dawg’s indel model 
[32], whose indel rate parameters (Eqs.(Rl .7,8,9)) are spatially and temporally homogeneous. 
And the current version uses exclusively a uniform length distribution of the ancestral 
sequence ( s Ro °') at the root ( n Root ): 



— Eq.(SM-5.3.1) 
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Thus, we always have [A p [s Roo ‘, s Root ,n Roo, ;C K ] = 1 for every possible s Ro "' and for every 

potentially indel-accommodating region (C K with V K E {1,..., K max }). Here, as the 
“reference root state” ( s Ro ° l ), we do not use the concatenated root states of the block-wise 
Dollo parsimonious indel histories. Instead, ass Root , we use an array consisting solely of all 
sites corresponding to the gapless columns. Under a space-homogeneous model, this poses no 

problem. Let N CLC (gc[s 1 ,s 2 ,...,s nX ]), or N GLC for short, be the number of gapless columns in 

afSj,^,...,^]. Then, from Eq.(R1.9), we have: 

Rf{s Root , t) = {k I+ k D )N GLC + A^A,, A d , /„(.)], - Eq.(SM-5.3.2) 

8R' x d (s, s r °°‘, t)[C K ] = (A, + A d ){l(s[C k ]) - l(C°'[C k ])} = (A, + A d )L( 5 [C k ]) . 


— Eq.(SM-5.3.3) 

Here, ,v[ C K | is the sub-sequence of the sequence s confined in the region C K . We also 
used the fact that s Roo> [C K ] is always empty. 

Then, the first approximation of each multiplication factor (SM-1.14) is given by: 
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— Eq.(SM-5.3.4) 

where the right hand side is given by Eq.(SM-1.15) with N = N m[n [ C K ] . Using this, the first 
approximation of the probability of the entire MSA is given by a reduced form of (Eq.Rl .5) 
(or Eq.(SM-1.12)). Their explicit expressions are: 


/><'”>[a[s,,s„..., V ]|r]-P 0 [s“ | r] J]M<!">[a[s, V ]; s„"“; C, 



— Eq.(SM-5.3.5a) 

with 


. Root Root 

U 0 , n 


)] X exp{-(A, + A„) |r|iv cic + A„, /„(.)] |r|}. 


— Eq.(SM-5.3.5b) 

Here, |^| = ^ /e(/1 \b\ is the total length over all branches in the tree (T ). In general, the set 


of regions that can accommodate local indel histories, {C K } , also contains the 

^ J K—1,..., K max 

positions sandwiched by adjacent gapless columns within each single gapless segment. In the 
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first approximation, the contribution, M^[a[,s 1 ,,s 2 ,...,,s JvX ]; s Root ; C K | 7'j, from each such 

sandwiched position is always trivial ( i.e ., unity). Thus, Eq.(SM-5.3.5a) could be further 
simplified as: 

K° mm 

P { '" ) [a[i,, s„..„ v 11 T] - P„ [s 0 “ | T] f]M$“>[a[s, V ]; s^';C » | t] . 

K=\ 

— Eq.(SM-5.3.5a’) 

Here, jc^j^ | ^ is the set of all gapped segments in the MSA. It is a subset of 

|C K | , and thus K’^ x < K max always holds. This Eq.(SM-5.3.5a’), supplemented by 

Eqs.(SM-1.15,16) and Eqs.(SM-5.3.3,5b), is the major output of the second core part, and of 
the entire algorithm. [NOTE: But the current latest version uses the local-PWA-based 
expressions, as noted around the top of this subsection.] 

As a byproduct, the second core part also outputs the relative probabilities among 
the parsimonious local indel histories that can give rise to the same local MSA confined in 
each gapped segment, C° K . The relative probability of each parsimonious local history, 

M(£)i [C° K ] , is calculated as: 
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Eq.(SM-5.3.6a) 
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— Eq.(SM-5.3.6b) 

SM-6. Comparing parsimonious local indel histories with true history 
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Rigorously speaking, unless the record is kept on the true local indel history that created each 
observed gapped segment, we cannot compare it with the predicted (parsimonious) histories. 
Nevertheless, if we have the ancestral sequence states at all the internal nodes aligned with the 
“extant” sequences at the external nodes, we can approximately judge whether the true local 
indel history matches one of the predicted (parsimonious) histories, by comparing the gap 
states of all the true ancestral sequences (in the segment in question) with the ancestral gap 
states in each predicted history. [It should be noted, however, that the judgment is only 
approximately correct, because the same set of ancestral sequence states could result from 
more than one local indel history if non-parsimonious histories are also allowed.] Dawg [32] 
can output the alignment of ancestral sequences at the labeled internal nodes with the “extant” 
sequences at the external nodes. Thus, we took advantage of this function and examined 
whether the true ancestral gap states in each instance of a gapped segment match those 
predicted by one of the parsimonious local indel histories. [NOTE: More precisely speaking, 
we compared the ancestral sequence states superposed to the homology structure oft ach local 
MSA of extant sequences.] If there is a match, we registered the instance as “parsimonious,” 
and recorded which parsimonious history can produce the true ancestral gap states. Otherwise, 
we registered the instance as “non-parsimonious.” Dawg sometimes creates gapped segments 
containing null columns, in each of which all extant sequences are occupied by gaps. In this 
study, such null columns were simply removed before the analyses. 

SM-7. Correlation analysis to validate predicted absolute occurrence probabilities of 
gapped segments 

To examine whether or not our first approximation of the augmented multiplication factor, 
Eq.(SM-5.3.4), works well, we first counted instances of gapped segments that occurred in 
each of the simulated sets A1 and A2 without reaching either MSA end, and that showed a 
particular gap-configuration (more precisely, a homology structure), say, G a . Then we 
compared the count of instances (. i.e ., the absolute frequency) of gap-configuration G a with 

its theoretical prediction, \fJ a | rj. The prediction was calculated using Eq.(SM-5.3.4) 

(for a gapped segment C K that exhibits G a ) as: 

iv£ r> [G fl |r] = iV r exp{-A DOT nA / ,A / ,/ D (.)]|r|-A D |r|} 

_ . — Eq.(SM-7.1) 

xM^[«[ 5l ,5 2 ,..., V ];C;C K I r]exp{-A 0 |r|} 

Here N T is the total number of sites in the root sequences where insertions/deletions 
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potentially occur. Roughly speaking, exp{-A Dowg [A / ,A / ,/ D (.)]|7'|-A D |r|j is the probability 

that the left-flanking gapless column remained undeleted. And expj-A D |r| | is the 

conditional probability that the right-flanking gapless column remained undeleted, given the 
gapped segment and the left-flanking gapless column. In this study, we simply set 
N t = 100,000 x 1,000 = 10 8 , which is the total number of bases in the root ancestral sequences 
in each input set. We used only those gap-configurations each of which is expected to occur 5 
or more times in each dataset. 

We compared the absolute frequencies predicted by Eq.(SM-7.1) against their 
actual frequencies in each simulated dataset by performing the correlation and linear 
regression analyses between their square roots. We did so based on the following rationale. 
The count of each gap-configuration in each simulated dataset is expected to roughly follow a 
Poison distribution, in which the standard error of the count of events is the square root of its 
mean. Therefore, the square root of the simulated count is expected to have a standard error 
that is roughly uniform independently of the gap-configuration. This uniformity of the 
standard error is a major assumption underlying the correlation and linear regression analyses. 

Before the analyses in this and the next sections, we pre-processed the simulated 
MS As so that MS As with a same homology structure [39] will be represented identically. See 
Methods of [38] for details. 

SM-8. Correlation analysis to validate predicted relative probabilities among 
parsimonious local indel histories 

To examine whether or not our formula for the relative probability, Eq.(SM-5.3.6a), works 
well with each of the simulated sets, 1A, IB, 3P, 3M and 3F, we first calculated 
Eq.(SM-5.3.6a) for all alternative parsimonious local indel histories of all “parsimonious” 
instances of gapped segments that do not reach either MSA end. Then, we distributed the 
histories enumerated for each input set into 20 non-overlapping bins of 5% width that jointly 
span the open interval, (0,1), of the theoretical relative probability. (The histories with the 
relative probability = 1 were excluded from the analyses because they could cause the 
performance to be unfairly overrated.) In each bin, we counted all instances of alternative 
histories considered, and we also counted actual instances of “correct” histories, whose 
ancestral gap states matched the true ones. Then, we compared the simulated proportion (i.e., 
relative frequency) of “correct” histories in each bin with the theoretically predicted 
probability that the history is “correct,” P^!\bin ). The P^Xbin) for each bin was calculated 
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by averaging the relative probabilities given by Eq.(SM-5.3.6a) over all instances of the 
considered alternative histories in the bin. We performed the correlation and linear regression 
analyses, using the actual relative frequency in each simulated dataset as the independent 
variable (X ), and using the predicted probability as the dependent variable (Y ). For the 
analyses, we used averages weighted by the reciprocal of the variance of the predicted 


bin 


probability, i.e., [weight} = 


where \bin\ is the total number of 



instances of considered alternative histories in the bin. 

Similar correlation and linear regression analyses were conducted also on the most 
likely (ML) parsimonious local indel histories alone, as well as on the least likely (LL) 
parsimonious histories alone. 

SM-9. Accuracy of HMM of Kim and Shinha applied to case (iv) local PWAs 

Here, we specifically examine the model of Kim and Sinha [36] in the light of our ab initio 
theoretical formulation. Their model is a generalized HMM, and calculates the probability of 
a PWA between the ancestral and descendant sequences along a branch as a product of 
block-wise probabilities. In their HMM, a block is either a column of a PAS, a run of gaps in 
the ancestor aligned with a run of residues in the descendant, or a run of gaps in the 
descendant aligned with a run of residues in the ancestor. Each PWA is actually a part of a 
MSA of given sequences at the external nodes and one of alternative sets of sequences at 
internal nodes. Ancestral gaps aligned with descendant gaps are removed before evaluating 
the probability of a PWA. Because their purpose is to find a most likely indel history and a 
resulting set of consistent ancestral sequence states at internal nodes, they are not interested in 
an indel event that begins and/or ends in the middle of a block (as in Figure S2, panels b and 
c). Thus, they only consider those events that insert/delete the entire blocks in single steps. 

We now calculate the probabilities of the local PWAs that were considered in the 
cases (i)-(iv) in Section R2 (Figures SI), via the model of [36] . And we compare the results to 
those via our theoretical formulation under Dawg’s parameters (Eqs.(Rl .7-9)). In the 
following, the probabilities via Kim and Sinha will be calculated according to Eq.(2) and 
Figure 1C of their paper [36], and the probabilities via our formulation will be calculated 
according to the prescriptions in section SM-2 (and in Appendix A1 of Part II). We set 
t F ~t I = \b\ in the following calculations. (Here \b\ denotes the length of branch b). Via the 
model of [36], the PWA probability in case (i) is calculated as: 


P KS [case (i)] = (1 - P:f (1 - P D f , - Eq.(SM-9.1) 


where p, and p D are the “transition probabilities of the‘Insertion’and‘Begin deletion’ 
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states,” respectively. The “reference” PWA probability calculated via our formulation 
(Eqs.(R1.2,3)) is: 

P ref [case (/)] = exp(-A DflWg \b\- 2(X, + X D )\b[j 

r ™ ] — Eq.(SM-9.2) 

x l + iA Pi2) [case(i)] + 2 Jn _ 3 !J, P(n) [case(i)\ . 

Here, A Dawg is the abbreviation of the “universal” factor for the indel exit rate ( i.e ., 

A Dawg | A,, A d , f D (.) ] in Eq.(Rl .9)). And u P(2) [ case (z)] is concretely expressed in 
Eq.(SM-2.2a). Now, assuming that (A, + A D )\b\ is sufficiently small, we expand the 
expression in the square brackets into the power series in X, \b\ and X D \b \, which will 
collectively be denoted as X\b\ when considering the order of magnitude. From 

Eqs.(SM-2.2a,b’), we get p P(2) [case (/)] = ^ | Xjfj{l)X D f D {l)\b\/l + o[(X\b\f^j . Moreover, 

the expansion of p P(n) [y K ;[a(s A , s D ), | ( s A , t 2 )j generally starts with 0^(A|A>|)” j 

terms. Thus, we have: 

P ref [case (z)] = exp(-A° flM ’ g |A»| — 2(A 7 + A D )|fc|) 

x 1 + A,A d |fe| ; (2^/,(')/„(/) /z) + o((A|*|) J ) 

This and Eq.(SM-9.1) will provide the baseline when examining the probabilities via the 
HMM of Kim and Sinha in other cases. 

In case (ii), the PWA probability under the HMM of Kim and Sinha is: 

P KS [case (zz); A L A ] = (1 - p, ) 3 (1 - p D f p D Pr D (A L A ). - Eq.(SM-9.3) 

Here Pr D (/) is the “probability distribution on the deletion length (/),” which is assumed as 
shared among different branches. To facilitate the comparison, we consider the ratio of the 
probability in case (ii) to that in case (i), which yields: 

P KS [case (z‘z); AL' 4 ]/ P KS [ case (z)] = (1 - p,) p D Pr H (AL 4 ). — Eq.(SM-9.4) 

Meanwhile, the probability via our formulation is: 

P ref [case (zz); A L A ] = exp(-A DflM '* \b\- (A, + X D )(2 + A L A )\b\) 

a1 • — Eq.(SM-9.5) 

x \2j„-c Up<n ^ case ^ AL J 

Here, p P[l) [case (zz); AL 4 ] is given by Eq.(SM-2.4’), and p P(2) [case (zz); AL 4 ] is given by 
Eq.(SM-2.5a) supplemented with Eqs.(SM-2.5b,c,d’,e’). The ratio of Eq.(SM-9.5) to 
Eq.(SM-9.2) is: 


— Eq.(SM-9.2’) 
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P ref [case (ii); A L A ]/ P ref [case (/)] 


= e 


-(A / +A D )AL' 4 |Z?| 


+°° 

l+ 2j n _ 2 P P (n)\- CaSe (01 


r +°° a 

\2j n -^P(n)l CClSe W’ AL ' ] 

Expanding this expression into the power series in A|7>|, we have: 
M case 07); AL A ]/^ e / [ case 


. — Eq.(SM-9.6) 


= A c |&|/ d (AL a ) + tA d (A / + A d )|/?| 2 G d (AL a ) + o(a\b \?). 


Eq.(SM-9.6’a) 


Here G d (AL a ) is defined as: 
G D (AL A ) = -AL A f D (AL A ) + 

A, 


A r 


AL -1 


A/ + A d /=1 


J(AL A -/ + l)/ fl 0)/ D (AL A -/) 


— Eq.(SM-9.6’b) 


A/ + A d 


(AL A +1) 


/ 7 (/)/ d (AL a +/) . 


/=! 


(Figure 7 of [43] shows the ratio G d (AL a )/ f D (AL A ) as a function of A L A .) 

Similarly, via the HMM of [36], the ratio of the PWA probability in case (iii) to that 
in case (i) is expressed as: 


P KS [case (iii); AL r> ]/ P KS [case (/)] = — Pr^AE 0 ). — Eq.(SM-9.7) 

1 — Pi 


Here Pr 7 (/) is the “probability distribution on the insertion length ( l ),” which also is 
assumed as shared among different branches. The ratio via our formulation is obtained by the 
power-series expansion in A|7>| of Eq.(Al.l.l’) and Eq.(A1.1.2a) supplemented with 
Eqs.(A1.1.2b,c,d’,e’) (all in Appendix of [43]). The result is: 

P ref [case (iii); AL D ~^jP ref [case (/)] = [2: | a)' 1 '[case (iii); AL°]||l + ^ ^ u { p \case (z)]J 

= A, \b\f r (AL°) + U,(A 7 + k D )\b\- G,(AL d ) + o((A|A>|) 3 ) 

— Eq.(SM-9.8a) 

Here G,(AL D ) is defined as: 


G I (AL D ) — AL D f I (AL D ) + -^—'2 t (AL°-l+ l)fj(AL D -l)f t (l) 


AL -1 


A/ + A D w 


A n 


. — Eq.(SM-9.8b) 


A 7 + A fl 


(A L d + 1) 


fj(AL D + l)f D (l) 


i=\ 


(Thanks to the symmetry between the probabilities under the time reversal, Figure 7 of [43] 
also gives the ratio Gj(AL d )/ f r (AL D ) as a function of AL° , when calculated under the same 
setting.) 

Now we compare the results under Kim and Sinha’s HMM (Eq.(SM-9.4) and 
Eq.(SM-9.7)) with the corresponding results obtained via our formulation (Eq.(SM-9.6’a) and 
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Eq.(SM-9.8a)). In the method of [36] , the substitutions p t = c, \b\ and p D = c D \b\ are made 
first. Then c, and c D are estimated from the total frequencies of insertions and deletions, 
respectively, along the external branches observed from the input MSA. Similarly, Pr.( A/. \) 
and Pr D (A L A ) are estimated from the observed length histograms for insertions and deletions, 
respectively. Let {b} PE be the set of branches used for parameter estimations. Then, using the 
summations of the “reference” results, Eq.(SM-9.6’a) and Eq.(SM-9.8a), both over {b} PE , 
we expect to have: 

^1 = 1 + 2 G d (AL a ) +0{{X\b\f), — Eq.(SM-9.9a) 

K 2 11 \L A =l 

r -| jCO 

^L£d_l +Vt^.^ 1 ) 2 G,(AL d ) + o((A|/?|) 2 ), — Eq.(SM-9.9b) 

A / 2 11 AL° =1 

E’[Pr D (AL /1 )] = f D (AL A ) + b l) |fc| G d {AL a ) + 0((A I b I) 2 ), 

A d 2 

— Eq.(SM-9.9c) 

£[P r/ (AL D )] = f I (AL D ) + ^±^<1l) |H Gj(AL d ) + 0((A I 6 I) 2 ). 

— Eq.(SM-9.9d) 

Here £[X] denotes the expected value of the estimated parameter X , which is the average 
of estimated X over all indel processes under the given set of conditions (the tree and model 

parameters). And we also used the notation, (X(b )}^ 

[NOTE: The actual values of c, and c D estimated by the method of [36] may be slightly 
smaller than Eqs.(SM-9.9b,c), because the denominator in their method is the total number of 
MSA columns, instead of the average numbers of possible indel positions in ancestral 

sequences.] Usually, ^■(A / + A D )^|Z?|\ is quite small, at most O^10 _1 j and typically 


O (10 2 ). Thus, as long as the actual parameters, A,, A D , f l (AL n ) , and f D ( AL a ), do not 


considerably vary across branches, and provided that the MSA is sufficiently long and 
accurate, the estimated values of c, and c D , respectively, should approximate A, and X D 
fairly well. Also, under the same situation, the estimated values of Pr r (l) and Pr fl (/), 
respectively, should approximate / 7 (/) and f D {l) fairly well, as long as the ratios 


G,(J)/f,(l) and G D (l)/f D (l ) are sufficiently less than 


«A, + A 0 )(|*|) 


. However, it 


does not actually matter so much whether or not the estimated Kim-Sinha parameters (c 7 , c D , 
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Pr 7 (/) and Pr D (/)) approximate Dawg’s indel parameters (A 7 , A H , / 7 (/) and f D (l)) fairly 
well. What actually matters is how accurately Eq.(SM-9.4) and Eq.(SM-9.7) approximate 
Eq.(SM-9.6’a) and Eq.(SM-9.8a), respectively, using the estimated parameters. In an extreme 
case where all branches have the same branch length, the approximation should be nearly 

perfect. This is because, in this case, \b\ = ^ for all branches, and thus because we can 

use Eqs.(SM-9.9a-d) without any significant modifications to estimate the probabilities (not 
involving case (iv)). [It should be noted here that Eq.(SM-9.4) and Eq.(SM-9.7), respectively, 
contain extra multiplication factors, (1 - p,) and (1 - p 7 ) _1 , compared to the corresponding 
Eq.(SM-9.6’a) and Eq.(SM-9.8a). However, these factors should remain close to 1, because 

Pj = c 7 \b\ should normally be at most O^IO -1 j.] In contrast, the approximations by 


Eq.(SM-9.4) and Eq.(SM-9.7) could considerably deteriorate, e.g., when 




times the ratios, G D (A L A )/f D (A L A ) and G 7 (A L° )// 7 (AL° ) , 


respectively, become comparable to or greater than 1 (unity). Because G d (A.L a ) and 
Gj(AL d ) are mostly contributed from next-parsimonious local histories containing 
overlapping indels, we can interpret the result as follows. “Overlapping indels start to make 
Kim and Sinha’s method poorly approximate the (case (ii) and (iii)) local PWA probabilities 
when the involved gap is long and the branch lengths show a large variation.” Let’s assume 
that there is a good reason to believe that the Dawg indel parameters (A 7 , A 0 , / 7 (/) 
and f D {l)) are shared among all branches. Then, one way to mitigate the aforementioned 
effects of overlapping indels may be to set: 

c 0 (|fc|)Pr D (/, |fc|) = A D f D (l) + A d *l±*v\b\G D (l ), - Eq.(SM-9.10a) 


c 7 (|/?|)Pr 7 (/, \b\) = A 7 / 7 (/) + A 7 -LLo.\b\G I (I ), - Eq.(SM-9.10b) 


and to fit A 7 , A H , / 7 (/) and f D (l) according to these equations supplemented with 
Eqs.(SM-9.6’b,8b). Now, as indicated by Figure 7 of [43], under the power-law indel length 
distributions, the ratios G D (A L A )/ f D (AL A ) and G, (A L°)/f l (A L °) are less than 4 in 
absolute value when the gap is 300 residues long or shorter. Therefore, the 2nd-order terms 


will begin to substantially influence the results when 


k*,+4)(h-(H) w ) 


is larger than, 


say, 0.1. Such a situation will be quite rare in practical sequence analyses. Even if we 
encounter such a rare case, then local histories with more than 2 indels will begin to account 
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for a substantial fraction of the probability. Considering this way, we expect that the method 
of Kim and Sinha [36] will pretty well approximate the probabilities of local PWAs belonging 
to cases (ii) and (iii) as long as the branch lengths are reasonable for phylogenetic analyses. 

Finally, we consider case (iv). Indel histories giving rise to the local sequence states 
in this category are shown, e.g ., in Figure S3 (in this paper), and panel A of Figure 6 of [49] . 
In such a situation, an aligner will reconstruct a PWA that is like one of the two PWAs in 
Figure SI, panel d (if the reconstruction is correct). And Kim-Sinha’s method assigns a 
probability according to the reconstructed PWA. Whether it is like the left one or the right one 
in Figure Sid, the assigned probability is the same, and its ratio to the probability of case (i) 
is: 


P KS [case (iv); AL a , AL D ]jP KS [case (;')] = p, Pq(A L°) p D Pr D (AL A ). — Eq.(SM-9.11) 


Via our formulation, how to calculate the probability in this case was briefly described near 
the bottom of section SM-2. (And it is detailed in Appendix A1.2 of [43].) In this case, each 
parsimonious history consists of two indels, and each next-parsimonious history consists of 
three indels. Because there are as many as 24 types of next-parsimonious histories, here we 
only consider the parsimonious histories. Then, the lowest-order contribution of the 
multiplication factor, p, P(2) [case (iv); AL A , AL°] , is given by Eq.(Al .2.1a), supplemented with 
Eqs.(A1.2.1b,c,d,e’,f’,g’), all in [43]. [NOTE: In [43], u P(2) [case (iv); AT 4 , AL n ] is denoted 
as ftp'[case (iv)] .] Expanding each term into a power series in A | A , we get the following 
expression for the ratio: 


P ref [case (iv); AL a , AL d P ref [case (/)] 


= e 


~(A,+?i D )AL A \b\ 


ji P{n) [case (iv); AL, AL] 


[2 

= k D ft\b\ 2 4 f D (AL a ) ft(AL d ) + 
+o((A\b \) i ). 


1-1 


+0 ° 

(01 

fi(AL D +l)f D (AL A +/) 


1=0 


— Eq.(SM-9.12) 

In case (iv), as opposed to in cases (ii) and (iii), the PWA probabilities via the HMM of [36] 
differ considerably from that via our formulation even when « 1 and |/?| << 1 . Under 

these conditions, p I Pr I (AL D ) and p D Pr D (AL A ) quite accurately approximate A, |6| /)(AL /) ) 
and X D \b\f D (AL A ) , respectively (see Eqs.(SM-9.9c,d)). Hence, the o((A|(?|) 3 j terms in 


38 









Eq.(SM-9.12) can be neglected. Thus, we have: 


p 

r ref 

case (zv); A L A , A L° 

o 

1_1 

.eC 

P 

1 KS 

case (zv); A L A , A L° 

jP KS [case (/)] 


3 | L c °-AL a > ^ + ^ ^ + 

2 + tr f D (AL A )MAL D ) 


— Eq.(SM-9.13) 

Table 3 shows this ratio for representative cases. The second term on the right hand side of 
Eq.(SM-9.13) is the effect of overlapping indels (as in Figure S3, panels d and e). When 
A L a = A L d = 1, this term is expected to be quite small; for example, it is about 0.167 if 
//(/) = /z>( O'* r 1 ' 6 . And it gets more and more influential when AL 4 and/or A L° gets larger, 
and it substantially exceeds 1 (unity) in some cases (Table 3). Actually, a similar effect was 
incorporated in the HMM of Knudsen and Miyamoto [50]. Their HMM could only 
accommodate geometric indel length distributions. Consequently, the relevant term was 
independent of A L A and A L° . Coming back to Eq.(SM-9.13), the first term on the right 
hand side, 3/2, differs from 1 (unity) because the HMM of [36] does not fully take account of 
the non-overlapping indel histories (e.g., panels a-c of Figure S3), either. [NOTE: More 
precisely, their HMM takes account of contributions only from either panels a and b or panels 
a and c, depending on the local HMM it deals with, but not from all of panels a, b and c.] This 
error is actually shared by most of the standard, or nearly standard, HMMs and transducers 
used thus far as probabilistic models of indels (such as those cited in Background of Part I). 
Taking these results into consideration, a possible major improvement on the model of Kim 
and Sinha [36] would be achieved through modifying the HMM structure, so that the 
probability of an insertion and an immediately adjacent deletion (or that of the opposite 
configuration) will be given by Eq.(SM-9.13) or its extension that includes the terms of 

higher-orders in A|&|. 
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Supplementary tables 


Table SI. Various “threshold gap lengths” for local PWAs of cases (ii) and (iii) 


X = (X I+ X D )(t F -t i y 

(A L)Z n b 

(ADS ‘ 

(AA)£ c 

(adS 1 

0.01 indels/site 

128 

160 

>300 

>300 

0.04 indels/site 

31 

41 

99 

272 

0.1 indels/site 

12 

17 

42 

119 

0.2 indels/site 

6 

8 

22 

66 

Approximate relation d 

y- 1.2 /x 

Y ~ 1.6 /X 

Y-4/X 

11/X (?) 


NOTE: See section Ml of Methods for details on the parameter settings. Because of the 
symmetry under the time reversal when A, = k D , the identical results apply to the local 
PWAs in both case (ii) and case (iii). This table is adapted from Table 1 of [43] . 

a The expected number of indels per site. 

b The number of ancestral sites in between the PASs, i.e., AL a or AL° , at which the total 
next-parsimonious contribution is 1/2 (=0.5) of the total parsimonious contribution. 
c (AL)^®' is the value of A L A or A L° at which the total contribution from local histories 
involving up to (and including) N ID indels each account for 1/2 (=0.5) of the “exact” 
multiplication factor for the local PWA. 

d A rough (inversely proportional) relationship between each threshold gap length ( Y) and the 
expected number of indels per site (X ). 
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Table S2. Goodness of power-law for finite-time transition probabilities of case (iii) local 
PWAs 


A/: A d 

(A j + A D )(t F - tj) 

Correlation 

coefficient 

Exponent c 
(Std. Err.(y)) 

Coefficient d 
(Std .Err.(log A)) 


0.001 indels/site 

-0.9999998 

1.5994 

0.000223 




(5.8xl0“ 5 ) 

(9.6 xl0“ 5 ) 


0.01 indels/site 

-0.9999997 

1.5998 

0.00222 




(7.6xl0“ 5 ) 

(0.00013) 

1 : 1 

0.04 indels/site 

-0.9999946 

1.5993 

0.00881 




(0.00030) 

(0.00050) 


0.1 indels/site 

-0.999968 

1.5981 

0.0217 




(0.00074) 

(0.0012) 


0.2 indels/site 

-0.99989 

1.5955 

0.0421 




(0.0014) 

(0.0023) 


0.001 indels/site 

-0.9999998 

1.5998 

0.000111 




(5.2 xl0“ 5 ) 

(8.6xl0“ 5 ) 


0.01 indels/site 

-0.9999994 

1.6036 

0.00111 




(0.00011) 

(0.00017) 

1 : 3 

0.04 indels/site 

-0.999990 

1.6143 

0.00442 




(0.00042) 

(0.00069) 


0.1 indels/site 

-0.99994 

1.6340 

0.01090 




(0.0010) 

(0.0016) 


0.2 indels/site 

-0.99981 

1.6623 

0.0213 




(0.0019) 

(0.0029) 


0.001 indels/site 

-0.9999998 

1.5990 

0.000334 




(6.3xl0“ 5 ) 

(0.00010) 


0.01 indels/site 

-0.9999997 

1.5960 

0.00333 




(7.3xl0“ 5 ) 

(0.00012) 

3 : 1 

0.04 indels/site 

-0.999995 

1.5846 

0.01317 




(0.00028) 

(0.00047) 


0.1 indels/site 

-0.99997 

1.5636 

0.0323 




(0.00065) 

(0.0011) 


0.2 indels/site 

-0.99991 

1.5329 

0.0625 




(0.0012) 

(0.0021) 
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NOTE on Table S2 (previous page): This table shows the results of the correlation and 
regression analyses. The independent variable (X ) is log(AL), where A L is the local PWA 

size. The dependent variable ( Y ) is log|d / ; v '" =200/ [case (iii); A L; [/,, / ]]j, where 

~{N id = 200) [ case (■■■y 

'1] is the “exact” multiplication factor associated with a local 

PWA of case (iii). See subsection SM-3.1 of Supplementary methods for more details. See 
section Ml of Methods for the parameter setting. The results apply also to case (ii) local 
PWAs with due modifications. 

a X, is the total insertion rate per site per unit time. X D is the total deletion rate per site per 
unit time. 

b The expected number of indels per site. 

c The power-law exponent, i.e., y of the approximate power-law, u!^ m=mY/ \AL\ = A(AL)~ y . 

d The power-law coefficient, i.e., A of the approximate power-law, 

~(v ro = 2 °° ) [ AL ] „ A ( AL )-/. 
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Table S3. Correlation and regression analyses on absolute frequencies of local MSA 
homology structures 


Dataset 

Homology 

structures 

analyzed 

Number of 

homology 

structures 

Correlation 

coefficient 

Slope 

(Std. Err.) 

Y-intercept 

(Std. Err.) 

1A 

All 

3,396 

0.99958 

0.99064 

(0.00050) 

-0.229 

(0.014) 


Rare invisible 

indels a 

3,390 

0.99958 

0.99065 

(0.00050) 

-0.228 

(0.014) 

IB 

All 

11,157 

0.99752 

0.96467 

(0.00064) 

-0.706 

(0.016) 


Rare invisible 

indels a 

9,831 

0.99917 

0.97221 

(0.00040) 

-0.532 

(0.011) 


NOTE: The independent variable (X) is the square root of the actual absolute frequency of 
each homology structure in each simulated dataset. The dependent variable (Y) is the square 
root of the absolute frequency predicted by Eq.(SM-7.1) in Supplementary methods. We 
analyzed homology structures that are predicted to occur 5 times or more in each of the MSA 
sets 1A and IB. See section M2 of Methods for details on the simulations. This table is a 
modified version of Table 1 of [48], 

a Homology structures of local MS As each of which is expected to undergo less than one 
unobservable indel. 
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Table S4. Correlation and regression analyses on relative frequencies of correct 
parsimonious indel histories 


Dataset 

Parsimonious indel 

histories used 

Number of 

histories a 

Correlation 

coefficient 

Slope 

(Std. Err.) 

Y-intercept 

(Std. Err.) 


All 

317,400 

0.999948 

1.0105 

(0.0025) 

-0.0045 

(0.0011) 

1A 

Most likely (ML) 

119,676 

0.999793 

0.9987 

(0.0049) 

0.0058 

(0.0038) 


Least likely (LL) 

119,856 

0.999683 

1.0105 

(0.0090) 

-0.0097 

(0.0024) 


All 

7,252,601 

0.999967 

1.0132 

(0.0019) 

-0.00153 

(0.00023) 

IB 

Most likely (ML) 

917,499 

0.999848 

0.99911 

(0.00411) 

0.0125 

(0.0032) 


Least likely (LL) 

925,036 

0.999441 

0.99905 

(0.0118) 

-0.0136 

(0.0017) 


All 

152,051 

0.9994 

0.9839 

(0.0081) 

0.0055 

(0.0035) 

3P 

Most likely (ML) 

70,041 

0.9966 

1.021 

(0.020) 

-0.023 

(0.013) 


Least likely (LL) 

70,041 

0.9991 

1.036 

(0.016) 

-0.0032 

(0.0046) 

3M 

All 

808,462 

0.999988 

0.9986 

(0.0011) 

-0.00016 

(0.00018) 


Most likely (ML) 

181,501 

0.9994 

1.017 

(0.008) 

-0.017 

(0.007) 


Least likely (LL) 

181,496 

0.99983 

1.034 

(0.0067) 

-0.00037 

(0.00036) 

3F 

All 

2,355,122 

0.999990 

1.0028 

(0.0010) 

-0.00013 

(0.00004) 

3F 

Most likely (ML) 

140,485 

0.99987 

1.0051 

(0.0037) 

-0.0024 

(0.0031) 

3F 

Least likely (LL) 

140,478 

0.99985 

1.0060 

(0.0062) 

-0.0019 

(0.0007) 
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NOTE on Table S4 (previous page): Here we analyzed all instances of local MS As in each of 
which one of 2 or more parsimonious indel histories was “correct.” The independent variable 
(X) is the simulated proportion that the alternative parsimonious indel histories in each bin 
actually yielded the corresponding homology structures of the local MS As (“simulated 
relative frequency”). The dependent variable (Y) is the average of the predicted relative 
probabilities of the histories in each bin (“predicted relative frequency”). Note that weighted 
analyses were conducted. For details on the simulations and the correlation/regression 
analysis, see section M2 of Methods and SM-8 of Supplementary methods, respectively. The 
upper half of this table was adapted from Table 2 of [48], 

a The number of instances of alternative parsimonious histories of the specified type 
(All/ML/LL). 
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Supplementary figures (with legends) 
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b Case (ii) C Case (iii) 


A 


L 

- 

- 

R 

D 


L 

Hi 

U 2 

R 


A 


L 

1 

2 

3 

R 

D 


L 

- 

- 

- 

R 


r ~\ 

or 

V J 

Figure SI. Four types of local gap configurations in PWA between ancestral and 
descendant sequences. 

3 Case (i). b Case (ii) with A L A = 3. C Case (iii) with A L° = 2 . d Case (iv) with A L A = 3 
and A L° = 2 . 

In each PWA, each site (a cell) is assigned an ancestry. In the leftmost column of each PWA, 
the boldface italic ‘A’ and ‘D’ stand for an ancestor ( s A ) and a descendant ( s D ), respectively. 
The boxes shaded in magenta and cyan represent unpreserved ancestral sites and inserted 
descendant sites, respectively. In panel d, the PWA on the right (in parentheses) is equivalent 
to the PWA on the left, as far as the homology structure alone is concerned. This figure was 
adapted from Figure 2 of [43], 
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Figure S2. Parsimonious and next-parsimonious indel histories yielding case (ii) local 
PWA. 

3 The parsimonious history, consisting of a single deletion, b and C Examples of 
non-parsimonious indel histories. That in b consists of two consecutive deletions. That in c 
consists of an insertion and a subsequent deletion. 

Each of these indel histories yields the local PWA in Figure SI, panel b. The boxes shaded in 
magenta and red represent ancestral sites to be deleted. The yellow-shaded boxes represent 
inserted sites to be deleted. 


48 




















































a [M D (2,4),M,(1,2)] 


b [M,(4,2),M D (2,4)] 


C [M,( 1,2),M d (4,6)] 


('1 

L 

1 

2 

3 

R 

♦ 





L 

R 


(s a \M d (2,4) 

♦ 

S N 



<*1 

L 

4 

5 

R 



-(s„|M,(l, 2 ) 


(*1 

L 

1 

2 

3 

R 


* 


''' 



L 

1 

2 

3 

4 

5 R 

= (/ M,(4,2) 

♦ 




L 

4 

5 

R 



= (5„|M d (2,4) 


<*1 

L 

1 

2 

3 

R 


♦ 




(•'cl 

L 

4 

5 

1 

2 

3 R 

\ 

\ 

\ 

\ 

CN 

II 


('1 

L 

4 

5 

R 



= (s r |M D (4,6) 


d [m,(4,3),M d (2,5)] 


(*1 

L 

1 

2 

3 

R 


♦ 





L 

1 

2 

3 

6 

4 5 R 

= (/M,( 4,3) 

♦ 



(*"1 

L 

4 

5 

R 



= (s,|M d (2,5) 


e [M,(1,3),M D (4,7)] 


(*1 

L 

1 

2 3 

R 


♦ 




(■M 

L 

4 

5 

6 

1 

2 3 R 

= (s A M,(l,3) 

♦ 



<*1 

L 

4 

5 

R 



= {s,\M d (4,1) 


Figure S3. Parsimonious indel histories yielding case (iv) local PWA. 

Each panel shows a parsimonoius indel history that results in the local PWA in Figure SI, 
panel d. Panels 3, b and C exhaust the histories with non-overlapping indels. Panels d and e 
exemplify the histories with overlapping indels. The boxes shaded in magenta, cyan and 
yellow, respectively, represent ancestral sites to be deleted, descendant sites that were inserted, 
and inserted sites to be deleted. 
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Figure S4. Goodness of first approximation for case (ii) (and (iii)) local PWAs. 

The graph shows the ratio of the total next-parsimonious contribution (by 2-indel histories) to 
the total parsimonious contribution (by 1-indel histories) for case (ii) or (iii) local PWAs, as 

the function of the number of sites (AL' 1 in case (ii) and A L° in case (iii), abscissa) and the 
distance ((A, + A D )(t F -t,) indels/site, different curves). See section Ml of Methods for the 
parameter setting. This figure is a modified version of Figure 3A of [43] . 
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A Flowchart 


(O) 

(ia) 


(ib) 


(ic) 


(iv) 



B Pre-processing steps 

(o) Input data 

Tree MSA 

1 ATC-CAGAC—GA 

2 AGCGTTCACACT-GC 

3 ATAGA—AGAGTATC 

4 ATC-A—AGTGTATC 


(ib) Decomposing into 
“gap-pattern blocks” 
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(ia) Reducing MSA to binary pattern 


1 111000111110011 
2 111111111111011 

3 111110011111111 

4 111010011111111 


(ic) Sorting blocks into gapless segments 
and gapped segments 




Figure S5. Overall workflow of our algorithm to calculate first-approximate ab initio 
MSA probability. 

The entire algorithm consists of steps (ia), (ib), (ic), (ii) and (iii), processing the input (o) into 
the final output at step (iv). A The flowchart. B The schematic illustration of the 
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pre-processing steps (ia-ic). The input data [ (o) ] consists mainly of a MSA (of DNA 
sequences here) and a phylogenetic tree of the aligned sequences (labeled with boldface 
numbers). An evolutionary model via indels is assumed to be given but is omitted here. Step 
(ia) reduces the input MSA to a binary 1/0 pattern, in which 1 and 0 represent the “presence” 
(of a residue) and the “absence” (i.e., a gap), respectively. Step (ib) decomposes the binary 
pattern into “gap-pattern block”s, or “block”s for short, each of which consists of contiguous 
columns of a given 1/0 pattern. Here each block is represented as a rectangular array of 
neighboring cells with a particular color. Step (ic) sorts the blocks into gapless segments and 
gapped segments. Each gapless segment is represented as contiguous blue cells enclosed by a 
blue rectangle labeled B k (with k = 0,1,2). And each gapped segment is represented as 

contiguous cells enclosed by a red rectangle labeled C° K (with K = 1,2). See section SM-5.1 
for more details. [NOTE: The set of all gapped segments, jc^j^ , is a subset of 

|C K } , which is the set of all regions that can accommodate local indel histories along 

l. J K—l,2,...,K max 

the tree.] This figure was adapted from Figure 1 of [48]. 
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Block a Block b 





Local history 




a b c 


Block a Block b Block c 




Local history 



Figure S6. Merging indel events in effectively contiguous gap-pattern blocks. 

In each panel, a gapped segment consisting of contiguous gap-pattern blocks (“block”s), and a 
phylogenetic tree of aligned sequences (left) is given. Then, the Dollo parsimonious history 
for each block is first inferred (middle). Second, the indel histories in the effectively 
contiguous blocks are merged if they are of the same type and occur along the same branch 
(right). As in Figure S5 B, a “1” and a “0” represent the presence state (/.<?., a residue) and the 
absence state (i.e., a gap), respectively. Note that each column under the “Blocks” (left) 
represents a gap-pattern block, and not necessarily a single column, in the MSA. In the indel 
histories in the middle step, “+x” and “-y” represent the insertion of block “x” and the 
deletion of block “y”, respectively. In the local indel histories in the final step (on the right), 
blocks in the same parentheses after the “+” or the sign, respectively, are inserted or 
deleted simultaneously. A Merging indel events in literally contiguous blocks. B Merging 
indel events in two blocks separated by a (run of) block(s) in which no downstream nodes 
with the “presence” state interrupt the merger. C In this case, the deletions of block a and 
block c, both along the exterior branch leading to sequence 1, cannot be merged because they 
are interrupted by the downstream node with the “presence” state (the red “1”) in block b. 

This figure was adapted from Figure 2 of [48] . 
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Tree 



Blocks 



a b c 



B Initial history 




C Alternative history 



F Parsimonious history 



(c) 

(a) 

( a,b,c) 
( a,b,c) 


G Block-wise representation of E 
Block a Block b Block c 





H “Branching” the deletion on block b 
Block a Block b Block c 



Figure S7. Looking for parsimonious local indel histories. 

For the gap-configuration (under the “Blocks”) and the tree shown in A, the initial step infers 
the history in B, but there is actually another parsimonious history (C). For the 
gap-configuration and the tree shown in D, using the history in E as an “intermediate” point 
always reachable from the initial history, we can find the actual parsimonious history shown 
in F. Panels G and H exemplify a “branch-and-merge” operation performed on the situation 
in D. G Looking closely at the indel history in E, we see that a deletion of a subsequence in 
block b occurs along the branch of the common ancestor of sequences 1 and 2. With this 
history as a starting point, in the “branching” step (H), the deletion is re-interpreted as 
deletions along the child branches. Finally, merging the resulting deletions with the 
effectively contiguous deletion(s) gives the local indel history in F in this example. This 
figure was adapted from Figure 3 of [48], 
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Initial history 




. (C) 
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C Re-interpreting the event (on block b) 
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D Sorting the ‘deletions’ 
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3. 1 

-(c):(2) 
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Figure S8.4. Sorting indel events that will undergo “branch-and-merge” processes. 

A The initial local indel history (right), given a gapped segment (under the “Blocks”) and a 
sequence tree (left). B If the input tree is rooted, it gets unrooted. C Then, an insertion event 
(as in block b in this example) can be re-interpreted as a ‘deletion’ event by reversing the 
(virtual) time direction (represented by a blue arrow). Here, “+(&):(4)” denotes the insertion of 
block b into sequence 4, and “-(b):(l,2,3)” denotes the ‘deletion’ of block b from (the ‘last 
common ancestor’ of) sequences 1,2, and 3. Similarly, “+(a):(3,4)” in the original history will 
also be re-interpreted as “-(a):(l ,2).” D In this way, we can re-interpret all the indel events as 
‘deletions’ (left), and sort them in descending order of the number of sequences undergoing 
the ‘deletion’ (right). This figure was adapted from Figure 4 of [48], 
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a Set 3P (12 primates) 


b Set 3M (15 mammals) 


C Set 3F (9 fast-evolving mammals) 




0J015 

0.162 


0J016 

0.217 


Figure S9. Phylogenetic trees used for simulated DNA sequence evolution. 

3 The tree of 12 primates, used for Set 3P. b The tree of 15 mammals, used for Set 3M. C 
The tree of 9 fast-evolving mammals, used for Set 3F. This figure was adapted from Figure 2 
of [38] . See [38] for more details on these trees. See section M2 of Methods for the settings 
for the simulations. 
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Figure S10. Simulation analyses on relative frequencies among local indel histories, 
using Set 3P (a) and Set 3F (b). 

The same notation and convention apply as those for Figure 7. 
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Figure Sll. Problem with long gaps and its solution by hierarchical partitioning. 

a A gapped segment with long gaps (red) often contains lots of short gaps (blue). This makes 
the simple partitioning less effective, b A coarse-grained partitioning according only to the 
configuration of long gaps, chopping a gapped segment into a number of sub-segments (I, II 
and III in this example), c A broad indel history resulting in the configuration of long gaps, d 
Fine-grained partitioning of the Sub-segment I in panel b according to the configurations of 
short gaps. This measure decomposes the ‘big’ problem into a set of sub-problems. The ‘big’ 
problem needs to consider potentially numerous indel histories, whereas each sub-problem 
needs only to consider a few candidate histories. Ignoring the sequence containing the long 
gap enables this measure. The ignored sequence is indicated by the parenthesized sequence ID 
and the dotted branch leading to it. Each column in this figure should be regarded as a 
gap-pattem block rather than a single site. This figure was adapted from Figure 30 of [48]. 
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